Error when checking target: expected dense_18 to have shape (1,) but got array with shape (10,) - tensorflow

Y_train = to_categorical(Y_train, num_classes = 10)#
random_seed = 2
X_train,X_val,Y_train,Y_val = train_test_split(X_train, Y_train, test_size = 0.1, random_state=random_seed)
Y_train.shape
model = Sequential()
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy',metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size = 86, epochs = 3,validation_data = (X_val, Y_val), verbose =2)
I have to classify the MNIST data into 10 classes. I am converting the Y_train into one hot encoded array. I have gone through a number of answers but none have helped. Kindly guide me in this regard as I am a novice in ML and neural network.

It seems there is no need to use model.add(Flatten()) in your first layer. Instead of doing so, you can use a dense layer with a specific input size like: model.add(Dense(64, input_shape=your_input_shape, activation="relu").
To ensure this issue happens because of the layers, you can check whether to_categorical() function works alone with jupyter notebook.
Updated Answer
Before the model, you should reshape your model. In that case 28*28 to 784.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))
I also suggest to normalize the data that could be done by simply dividing the images to 255
After that step you should create your model.
model = Sequential([
Dense(64, activation='relu', input_shape=(784,)),
Dense(64, activation='relu'),
Dense(10, activation='softmax'),
])
Have you noticed input_shape=(784,) That is the shape of your flattened input.
Last step, compiling and fitting.
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(
train_images,
train_labels,
epochs=10,
batch_size=16,
)
What you do is you have just flattened the input layer without feeding the network with an input. That's why you experience an issue. The point is you should manually reshape your inputs and feed forward to the Dense() layers with parameter input_shape

Related

Validation loss decreasing but validation accuracy is fluctuating

I am training my first ML model. I am working on a 10-class classification problem. From what I can see, the model is overfitting since there is a significant difference between the training and validation accuracy.
This is the relevant code for the model
model = keras.Sequential()
model.add(keras.Input(shape=(x_train[0].shape)))
model.add(tf.keras.layers.Conv2D(filters=32,kernel_size=3, strides = (3, 3), padding = "same", activation = "relu", kernel_regularizer=tf.keras.regularizers.l1_l2(0.01)))
model.add(tf.keras.layers.MaxPool2D(strides=2))
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), padding='valid', activation='relu', kernel_regularizer=tf.keras.regularizers.l1_l2(0.01)))
model.add(tf.keras.layers.MaxPool2D(strides=2))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(10))
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001/2)
model.summary()
model.compile(optimizer=optimizer,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs = 30, validation_data = (x_val, y_val), callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=4))
There are large fluctuations in the validation accuracy and I am not sure why.
I have tried augmenting the data and have also injected noise into the training data. (This is an audio classification problem with 10 different classes)
https://i.stack.imgur.com/TXe50.png

Keras - Trying to get 'logits' - one layer before the softmax activation function

I'm trying to get the 'logits' out of my Keras CNN classifier.
I have tried the suggested method here: link.
First I created two models to check the implementation :
create_CNN_MNIST CNN classifier that returns the softmax probabilities.
create_CNN_MNIST_logits CNN with the same layers as in (1) with a little twist in the last layer - changed the activation function to linear to return logits.
Both models were fed with the same Train and Test data of MNIST. Then I applied softmax on the logits, I got a different output from the softmax CNN.
I couldn't find a problem in my code. Maybe you could help advise another method to extract 'logits' from the model?
the code:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
def create_CNN_MNIST_logits() :
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='linear'))
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
def my_sparse_categorical_crossentropy(y_true, y_pred):
return keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=True)
model.compile(optimizer=opt, loss=my_sparse_categorical_crossentropy, metrics=['accuracy'])
return model
def create_CNN_MNIST() :
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
# load data
X_train = np.load('./data/X_train.npy')
X_test = np.load('./data/X_test.npy')
y_train = np.load('./data/y_train.npy')
y_test = np.load('./data/y_test.npy')
#create models
model_softmax = create_CNN_MNIST()
model_logits = create_CNN_MNIST_logits()
pixels = 28
channels = 1
num_labels = 10
# Reshaping to format which CNN expects (batch, height, width, channels)
trainX_cnn = X_train.reshape(X_train.shape[0], pixels, pixels, channels).astype('float32')
testX_cnn = X_test.reshape(X_test.shape[0], pixels, pixels, channels).astype('float32')
# Normalize images from 0-255 to 0-1
trainX_cnn /= 255
testX_cnn /= 255
train_y_cnn = utils.to_categorical(y_train, num_labels)
test_y_cnn = utils.to_categorical(y_test, num_labels)
#train the models:
model_logits.fit(trainX_cnn, train_y_cnn, validation_split=0.2, epochs=10,
batch_size=32)
model_softmax.fit(trainX_cnn, train_y_cnn, validation_split=0.2, epochs=10,
batch_size=32)
On the evaluation stage, I'll do softmax on the logits to check if its the same as the regular model:
#predict
y_pred_softmax = model_softmax.predict(testX_cnn)
y_pred_logits = model_logits.predict(testX_cnn)
#apply softmax on the logits to get the same result of regular CNN
y_pred_logits_activated = softmax(y_pred_logits)
Now I get different values in both y_pred_logits_activated and y_pred_softmax that lead to different accuracy on the test set.
Your models are probably being trained differently, make sure to set the seed prior to both fit commands so that they're initialised the same weights and have the same train/val split. Also, is the softmax might be incorrect:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x)
return e_x / e_x.sum(axis=1)
This is numerically equivalent to subtracting the max (https://stackoverflow.com/a/34969389/10475762), and the axis should be 1 if your matrix is of shape [batch, outputs].

Keras layer shape incompatibility for a small MLP

I have a simple MLP built in Keras. The shapes of my inputs are:
X_train.shape - (6, 5)
Y_train.shape - 6
Create the model
model = Sequential()
model.add(Dense(32, input_shape=(X_train.shape[0],), activation='relu'))
model.add(Dense(Y_train.shape[0], activation='softmax'))
# Compile and fit
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=10, batch_size=1, verbose=1, validation_split=0.2)
# Get output vector from softmax
output = model.layers[-1].output
This gives me the error:
ValueError: Error when checking input: expected dense_1_input to have shape (6,) but got array with shape (5,).
I have two questions:
Why do I get the above error and how can I solve it?
Is output = model.layers[-1].output the way to return the softmax vector for a given input vector? I haven't ever done this in Keras.
in the input layer use input_shape=(X_train.shape[1],) while your last layer has to be a dimension equal to the number of classes to predict
the way to return the softmax vector is model.predict(X)
here a complete example
n_sample = 5
n_class = 2
X = np.random.uniform(0,1, (n_sample,6))
y = np.random.randint(0,n_class, n_sample)
model = Sequential()
model.add(Dense(32, input_shape=(X.shape[1],), activation='relu'))
model.add(Dense(n_class, activation='softmax'))
# Compile and fit
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=10, batch_size=1, verbose=1)
# Get output vector from softmax
model.predict(X)

How do I reshape input layer for Conv1D in Keras?

I looked at various responses already but I dont understand why I am constantly getting (10, 5).
Why is it asking for a shape of (10,5)? Where is it even getting that number from?
I am under the impression that the shape of the input data should be ("sample_size", "steps or time_len", "channels or feat_size") => (3809, 49, 5).
I am also under the impression that the input shape for Conv1D layer should be ("steps or time_len", "channels or feat_size").
Am I misunderstanding something?
My input data looks something like this:
There is a total of 49 days, 5 data points per each day. There is a total of 5079 sample size. 75% of the data for training, 25% for validation. 10 possible prediction output answers.
x_train, x_test, y_train, y_test = train_test_split(np_train_data, np_train_target, random_state=0)
print(x_train.shape)
x_train = x_train.reshape(x_train.shape[0], round(x_train.shape[1]/5), 5)
x_test = x_test.reshape(x_test.shape[0], round(x_test.shape[1]/5), 5)
print(x_train.shape)
input_shape = (round(x_test.shape[1]/5), 5)
model = Sequential()
model.add(Conv1D(100, 2, activation='relu', input_shape=input_shape))
model.add(MaxPooling1D(3))
model.add(Conv1D(100, 2, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(49, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=64, epochs=2, validation_data=(x_test, y_test))
print(model.summary())
I get this error:
Print out of layers
You are dividing by 5 twice. Here you are reshaping your data, which is necessary contrary to what the other answer says:
x_train = x_train.reshape(x_train.shape[0], round(x_train.shape[1]/5), 5)
x_test = x_test.reshape(x_test.shape[0], round(x_test.shape[1]/5), 5)
This already takes care of "dividing the time by 5". But here you are defining the input shape to the model, dividing by 5 again:
input_shape = (round(x_test.shape[1]/5), 5)
Simply use
input_shape = (x_test.shape[1], 5)
instead! Note that because this shape is called after the reshape, it already refers to the correct one, with the time dimension divided by 5.
You are using Conv1D, but trying, by reshaping, represent your data in 2D - that make a problem. Try to skip the part with reshaping, so your input will be a 1 row with 49 values:
x_train, x_test, y_train, y_test = train_test_split(np_train_data, np_train_target, random_state=0)
print(x_train.shape)
input_shape = (x_test.shape[1], 1)
model = Sequential()
model.add(Conv1D(100, 2, activation='relu', input_shape=input_shape))
model.add(MaxPooling1D(3))
model.add(Conv1D(100, 2, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(49, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=64, epochs=2, validation_data=(x_test, y_test))

How to customise a CNN layers with TensorFlow 2, Feed new inputs at Dense Layers of CNN [duplicate]

I have 1D sequences which I want to use as input to a Keras VGG classification model, split in x_train and x_test. For each sequence, I also have custom features stored in feats_train and feats_test which I do not want to input to the convolutional layers, but to the first fully connected layer.
A complete sample of train or test would thus consist of a 1D sequence plus n floating point features.
What is the best way to feed the custom features first to the fully connected layer? I thought about concatenating the input sequence and the custom features, but I do not know how to make them separate inside the model. Are there any other options?
The code without the custom features:
x_train, x_test, y_train, y_test, feats_train, feats_test = load_balanced_datasets()
model = Sequential()
model.add(Conv1D(10, 5, activation='relu', input_shape=(timesteps, 1)))
model.add(Conv1D(10, 5, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.5, seed=789))
model.add(Conv1D(5, 6, activation='relu'))
model.add(Conv1D(5, 6, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.5, seed=789))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5, seed=789))
model.add(Dense(2, activation='softmax'))
model.compile(loss='logcosh', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=batch_size, epochs=20, shuffle=False, verbose=1)
y_pred = model.predict(x_test)
Sequential model is not very flexible. You should look into the functional API.
I would try something like this:
from keras.layers import (Conv1D, MaxPool1D, Dropout, Flatten, Dense,
Input, concatenate)
from keras.models import Model, Sequential
timesteps = 50
n = 5
def network():
sequence = Input(shape=(timesteps, 1), name='Sequence')
features = Input(shape=(n,), name='Features')
conv = Sequential()
conv.add(Conv1D(10, 5, activation='relu', input_shape=(timesteps, 1)))
conv.add(Conv1D(10, 5, activation='relu'))
conv.add(MaxPool1D(2))
conv.add(Dropout(0.5, seed=789))
conv.add(Conv1D(5, 6, activation='relu'))
conv.add(Conv1D(5, 6, activation='relu'))
conv.add(MaxPool1D(2))
conv.add(Dropout(0.5, seed=789))
conv.add(Flatten())
part1 = conv(sequence)
merged = concatenate([part1, features])
final = Dense(512, activation='relu')(merged)
final = Dropout(0.5, seed=789)(final)
final = Dense(2, activation='softmax')(final)
model = Model(inputs=[sequence, features], outputs=[final])
model.compile(loss='logcosh', optimizer='adam', metrics=['accuracy'])
return model
m = network()