Adapt Text Classifier Neural Net To Accept Multiple Categories - tensorflow

I'm trying to adjust the text classifier neural net in this Keras/Tensorflow tutorial to output multiple (more than 2 categories). I think I can change the output layer to use a 'softmax' activation but I'm not sure how to adjust the input layer.
Tutorial Link:
https://www.tensorflow.org/tutorials/keras/basic_text_classification
The tutorial is using movie review data and the only two categories are positive or negative so the model only uses an output layer with activation set to 'sigmoid'.
I have 16 categories represented using one-hot encoding.
Tutorial Example:
vocab_size = 10000
model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation=tf.nn.relu))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))
My Attempt:
model.add(keras.layers.Embedding(10000, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation=tf.nn.relu))
model.add(keras.layers.Dense(16, activation='softmax'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc'])
history = model.fit(x_train[:5000],
y_train[:5000],
epochs=1,
batch_size=256,
validation_data=(x_train[5000:], y_train[5000:]),
verbose=1)
Error:
ValueError: A target array with shape (5000, 1) was passed for an output of shape (None, 16) while using as loss binary_crossentropy. This loss expects targets to have the same shape as the output.
Data Shapes:
x_train[:5000] (5000, 2000)
y_train[:5000] (5000,16)
x_train[5000:] (1934, 2000)
y_train[5000:] (1934,16)
Model Summary:
Model: "sequential_16"
Layer (type) Output Shape Param #
embedding_15 (Embedding) (None, None, 16) 160000
global_average_pooling1d_15 (None, 16) 0
dense_30 (Dense) (None, 16) 272
dense_31 (Dense) (None, 16) 272
Total params: 160,544
Trainable params: 160,544
Non-trainable params: 0

Binary_crossentropy is for binary classification, but what you're looking for is categorical_crossentropy. Binary_crossentropy expects your y matrix to be (n_samples x 1), with values of 0 or 1. Categorical_crossentropy expects your y matrix to be (n_samples x n_categories), with the correct category labeled as 1 and the other categories labeled as 0. It sounds like the way you did one-hot-encoding would be correct, so you probably just need to change your loss function.
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

Related

Dimension of output in Dense layer Keras

I have the sample following model
from tensorflow.keras import models
from tensorflow.keras import layers
sample_model = models.Sequential()
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics = ["accuracy"])
IP for the model:
sam_x = np.random.rand(10,4)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
sample_model.fit(sam_x,sam_y)
The confusion is the fit should have thrown an error of shape mismatch as the expected_input_shape for the 2nd Dense Layer is given as (None,44) but the output for the 1st Dense Layer (which is the input of the 2nd Dense Layer) will be of shape (None,32). But it ran successfully.
I don't understand why there was no error. Any clarifications will be helpful
The input_shape keyword argument has an effect only on the first layer of a Sequential. The shape of the input of the other layers will be derived from their previous layer.
That behaviour is hinted in the doc of tf.keras.layers.InputShape:
When using InputLayer with Keras Sequential model, it can be skipped by moving the input_shape parameter to the first layer after the InputLayer.
And in the Sequential Model guide.
The behaviour can be confirmed by looking at the source of the Sequential.add method:
if not self._layers:
if isinstance(layer, input_layer.InputLayer):
# Case where the user passes an Input or InputLayer layer via `add`.
set_inputs = True
else:
batch_shape, dtype = training_utils.get_input_shape_and_dtype(layer)
if batch_shape:
# Instantiate an input layer.
x = input_layer.Input(
batch_shape=batch_shape, dtype=dtype, name=layer.name + '_input')
# This will build the current layer
# and create the node connecting the current layer
# to the input layer we just created.
layer(x)
set_inputs = True
If there is no layers yet in the model, then an Input will be added to the model with the shape derived from the first layer of the model. This is done only if no layer is present yet in the model.
That shape is either fully known (if input_shape has been passed to the first layer of the model) or will be fully known once the model is built (for example, with a call to model.build(input_shape)).
The thing is after checking the input shape of the model from the first layer, it won't check or deal with other declared input shape inside that same model. For example, if you write your model the following way
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.add(layers.Dense(8, input_shape = (32,)))
The program will always check the first declared input shape layer and discard the rest. So, if you start your first layer with input_shape = (44,), you need to pass exact feature numbers to your model as input such as:
sam_x = np.random.rand(10,44)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
sample_model.fit(sam_x,sam_y)
Additionally, if you look at the Functional API, unlike the Sequential model, you must create and define a standalone Input layer that specifies the shape of input data. It's not learnable but simply a spec layer. It's a kind of gateway of the input data for the model. That means even if we define input_shape inside the other layers, they all will be discarded. For example:
nputs = keras.Input(shape=(4,))
dense = layers.Dense(64, input_shape=(8,)) # dicard input_shape
x = dense(inputs)
x = layers.Dense(64, input_shape=(16,))(x) # dicard input_shape
outputs = layers.Dense(10)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
Here is a more complex example with Conv2D and MNIST.
encoder_input = keras.Input(shape=(28, 28, 1),)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[32,32,3])(encoder_input)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[64,64,3])(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[224,321,3])(x)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[420,32,3])(x)
x = layers.GlobalMaxPooling2D()(x)
out = layers.Dense(10, activation='softmax')(x)
encoder = keras.Model(encoder_input, out, name="encoder")
encoder.summary()
Model: "encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_15 (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 26, 26, 16) 160
_________________________________________________________________
conv2d_9 (Conv2D) (None, 24, 24, 32) 4640
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 32) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 6, 6, 32) 9248
_________________________________________________________________
conv2d_11 (Conv2D) (None, 4, 4, 16) 4624
_________________________________________________________________
global_max_pooling2d_2 (Glob (None, 16) 0
_________________________________________________________________
dense_56 (Dense) (None, 10) 170
=================================================================
Total params: 18,842
Trainable params: 18,842
Non-trainable params: 0
def pre_process(image, label):
return (image / 256)[...,None].astype('float32'),
tf.keras.utils.to_categorical(label, num_classes=10)
(x, y), (_, _) = tf.keras.datasets.mnist.load_data('mnist')
encoder.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = tf.keras.metrics.CategoricalAccuracy(),
optimizer = tf.keras.optimizers.Adam())
encoder.fit(x, y, batch_size=256)
4s 14ms/step - loss: 1.4303 - categorical_accuracy: 0.5279
I think Keras will create (or preserves to create) an additional Input Layer - but as the second dense layer is added using model.add() it will automatically be connected to the layer before, and thus the extra input layer stays unconnected and is not part of the model.
(I agree that it would be nice of Keras to hint at unconnected layers, I sometimes created unconnected layers when using the functional API and changed the inputs. Keras doesn't remind me that I had jumped several layers, I just wondered why the summary() was so short...)

GradientTape returns None

I am trying to use grad-CAM (I'm following this https://www.pyimagesearch.com/2020/03/09/grad-cam-visualize-class-activation-maps-with-keras-tensorflow-and-deep-learning/ from PyImageSearch) on a CNN I'm using transfer learning on.
In particular, I am using a simple CNN for a regression problem. I used MobileNetV2 with an Average Pooling layer and a Dense layer with one unit on top, as shown below:
base_model = MobileNetV2(include_top=False, input_shape=(224, 224, 3), weights='imagenet')
base_model.trainable = False
inputs = keras.Input(shape=(224, 224, 3))
x = base_model(inputs)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(1, activation="linear")(x)
model = keras.Model(inputs, outputs)
and the summary is:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
mobilenetv2_1.00_224 (Model) (None, 7, 7, 1280) 2257984
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280) 0
_________________________________________________________________
dense (Dense) (None, 1) 1281
=================================================================
Total params: 2,259,265
Trainable params: 1,281
Non-trainable params: 2,257,984
_________________________________________________________________
I initialize the CAM object with:
pred = 0.35
cam = GradCAM(model, pred, layerName='input_2')
where pred is the predicted output on which I want to inspect the CAM and I also specify the layer name in order to refer to the input layer. Then I compute the heatmap on a sample image "img":
heatmap = cam.compute_heatmap(img)
Now, let's focus on a part of the implementation of the function compute_heatmap from PyImageSearch:
# record operations for automatic differentiation
with tf.GradientTape() as tape:
# cast the image tensor to a float-32 data type, pass the
# image through the gradient model, and grab the loss
# associated with the specific class index
inputs = tf.cast(image, tf.float32)
(convOutputs, predictions) = gradModel(inputs)
# loss = predictions[:, self.classIdx] # original from PyImageSearch
loss = predictions[:] # modified by me as I have only 1 output unit
# use automatic differentiation to compute the gradients
grads = tape.gradient(loss, convOutputs)
The problem here is that the gradient grads is None.
I thought that maybe the problem could lie in the network structure (all goes fine when reproducing the example of the classification task from the website), but I can't figure out where is the problem with this network used for regression!
Could you please help me?

Getting intermediate layer output from a nested network - Keras

I have a U-net network with VGG16 encoder architecture with pre-trained imagenet weights. Since my input images are grayscale, I added in a convolutional layer with depth 3 prior to sending the input to the U-net model.
Now, I'm trying to get the output of an intermediate layer within the U-net network. I create an intermediate model whose output is the output of the layer that I'm interested in. Here is my code:
base_model = sm.Unet('vgg16', encoder_weights='imagenet', classes=1, activation='sigmoid')
inp = Input(shape=(448, 224, 1))
l1 = Conv2D(3, (1,1))(inp)
out = base_model(l1)
model = Model(inp, out)
model.summary()
intermediate_layer_model = Model(inputs=model.layers[0].input,
outputs=model.get_layer('model_1').get_layer('center_block2_relu').output)
Here is the output:
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 448, 224, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 448, 224, 3) 6
_________________________________________________________________
model_1 (Model) multiple 23752273
=================================================================
Total params: 23,752,279
Trainable params: 23,748,247
Non-trainable params: 4,032
_________________________________________________________________
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, ?, ?, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
It seems to me that there is an issue with the U-net model having an input layer (input_1) and I'm not supplying this information during the construction of intermediate_layer_model. However, I expect that the intermediate model to take only the grayscale images as input and not require an additional 3-channel input.
Any help would be appreciated.

how to save custom trained model without full connect layer just like MobileNetV2 include_top=False

i want to save my trained model to .h5 without last two layers, in order to transfer learning using my custom model in the furture, just like MobileNetV2 include_top=False, can someone help me, thanks!
base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(
alpha=1.0,
input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(255, activation=tf.nn.softmax)
])
trained model like this:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mobilenetv2_1.00_224 (Model) (None, 2, 2, 1280) 2257984
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280) 0
_________________________________________________________________
dense (Dense) (None, 205) 262605
=================================================================
Total params: 2,520,589
Trainable params: 2,486,477
Non-trainable params: 34,112
_________________________________________________________________
when i try to using it for transfer learning
keras_model = loadModel(keras_model_path)
keras_model.summary()
input = keras_model.input
hidden = tf.keras.layers.GlobalMaxPooling2D()(keras_model.layers[-3].output)
out = tf.keras.layers.Dense(128, activation=tf.nn.softmax)(hidden)
model2 = tf.keras.Model(input, out)
model2.summary()
an error occurs
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, 64, 64, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
i want to save my trained model to .h5 without last two layers,
why don't you save the full model with model.save() and when you reload it for transfer learning, just remove the layers using:
model.layers.pop()
You can also remove the layers before saving the model but I wouldn't do that

Tensorflow with Keras: ValueError - expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)

I am trying to use Tensorflow through Keras to build a network that uses time-series data to predict the next value, but I'm getting this error:
ValueError: Error when checking target: expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)
What is causing this? I've tried reshaping the data as other posts have suggested, but to no avail so far. Here is the code:
import keras
import numpy as np
import os
from keras import losses
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Conv1D, Conv2D
# add the desktop to our path so we can access the data
os.path.join("C:\\Users\\user\\Desktop")
# import data
data = np.genfromtxt("C:\\Users\\user\\Desktop\\aapl_blocks_10.csv",
delimiter=',')
# separate into inputs and outputs
X = data[:, :9]
X = np.expand_dims(X, axis=2) # reshape (409, 9) to (409, 9, 1) for network
Y = data[:, 9]
# separate into test and train data
X_train = X[:100]
X_test = X[100:]
Y_train = Y[:100]
Y_test = Y[100:]
# set parameters
batch_size = 20;
# define model
model = Sequential()
model.add(Conv1D(filters=20,
kernel_size=5,
input_shape=(9, 1),
padding='causal'))
model.add(Flatten())
model.add(Dropout(rate=0.3))
model.add(Dense(units=10))
model.add(Activation('relu'))
model.add(Dense(units=1))
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
# train model
model.fit(X_train, Y_train, epochs=10, batch_size=batch_size)
# evaluate model
model.evaluate(X_test, Y_test, batch_size=batch_size)
And here is the model summary:
Layer (type) Output Shape Param #
=================================================================
conv1d_43 (Conv1D) (None, 9, 20) 120
_________________________________________________________________
flatten_31 (Flatten) (None, 180) 0
_________________________________________________________________
dropout_14 (Dropout) (None, 180) 0
_________________________________________________________________
dense_83 (Dense) (None, 10) 1810
_________________________________________________________________
activation_29 (Activation) (None, 10) 0
_________________________________________________________________
dense_84 (Dense) (None, 1) 11
=================================================================
Total params: 1,941
Trainable params: 1,941
Non-trainable params: 0
If there's a proper way to be formatting the data, or maybe a proper way to stack these layers, I would love to know.
I suspect you need to squeeze the channel dimension from the output, i.e. the labes are shape (batch_size, 9) and you're comparing that against the output of a dense layer with 1 channel which has size (batch_size, 9, 1). Solution: squeeze/flatten before calculating the loss.
...
model.add(Activation('relu'))
model.add(Dense(units=1))
model.add(Flatten())
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
A note on squeeze vs Flatten: in this case, the result of squeezing (removing an axis of dimension 1) and flattening (making something of shape (batch_size, n, m, ...) into shape (batch_size, nm...) will be the same. Squeeze might be slightly more appropriate in this case, since if you accidentally squeeze an axis without dimension 1 you'll get an error (a good thing), as opposed to having your program run with unexpected behaviour. I don't use keras much though and couldn't find a 'Squeeze' layer - just a squeeze function - and I'm not entirely sure how to integrate it.