Keras, calculating gradients of the loss wrt the input on an LSTM - tensorflow

I am quite new to machine learning and I was messing around with adversarial-examples.
I am trying to fool a binary character-level LSTM text classifier.
Thus I need the gradient of the loss w.r.t. the input.
The gradients function although returns None.
I already tried to get the gradients, like in this post
or this post, but the gradients function still returns None.
EDIT: I wanted to do something similar than in this git repo.
I was thinking that the problem might be that it was an LSTM classifier.
I am not sure at this point. But I think that it should be possible to get these gradients even from an LSTM classifier right?
Here is my code:
import numpy as np
from keras.preprocessing import sequence
from keras.models import load_model
import data
import pickle
import keras.backend as K
def adversary():
model, valid_chars = loadModel()
model.summary()
#load data
X, y, maxlen, _ , max_features, indata = prepare_data(valid_chars)
target = y[0]
# Get the loss and gradient of the loss wrt the inputs
target = np.asarray(target).astype('float32').reshape((-1,1))
loss = K.binary_crossentropy(target, model.output)
print(target)
print(model.output)
print(model.input)
print(loss)
grads = K.gradients(loss, model.input)
#f = K.function([model.input], [loss, grads])
#print(f(X[1:2]))
print(model.predict(X[0:1]))
print(grads)
The output looks like this:
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 74, 128) 5120
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 131584
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 129
_________________________________________________________________
activation_1 (Activation) (None, 1) 0
=================================================================
Total params: 136,833
Trainable params: 136,833
Non-trainable params: 0
_________________________________________________________________
Maxlen: 74
Data preparing finished
[[0.]]
Tensor("activation_1/Sigmoid:0", shape=(?, 1), dtype=float32)
Tensor("embedding_1_input:0", shape=(?, 74), dtype=float32)
Tensor("logistic_loss_1:0", shape=(?, 1), dtype=float32)
[[1.1397913e-13]]
[None]
I was hoping to get the gradients of the loss w.r.t. the input data to see which of the characters has the most impact on the output.
Thus I could fool the classifier by modifying the respective characters.
Is this possible? If yes, what is wrong with my approach?
Thank you for your time.

Gradients can only be computed for "trainable" tensors, so you might want to wrap your input into tf.Variable().
As soon as you want to work with gradient, I would suggest doing it using tensorflow, which nicely integrates with Keras. Below is my example of doing it, note that it works in eager execution mode (default in tensorflow 2.0).
def train_actor(self, sars):
obs1, actions, rewards, obs2 = sars
with tf.GradientTape() as tape:
would_do_actions = self.compute_actions(obs1)
score = tf.reduce_mean(self.critic(observations=obs1, actions=would_do_actions))
loss = - score
grads = tape.gradient(loss, self.actor.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.actor.trainable_weights))

I just found this thread.
The gradients function returns None because the embedding layer in not differentiable.
The embedding layer is implemented as K.gather which is not differentiable, so there is no gradient.

Related

Summary of models constructed for transfer learning in tensorflow keras

I'm using tensorflow 2.6 keras for transfer learning. Currently I take MobileNetV2. I take input, apply some preprocessing using Lambda layer, then feed this preprocessed input to MobileNetV2, then add Dense layer and train this thing. Training, inference etc actually work as expected.
However, the summary of the model looks as follows:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, 201, 189, 1)] 0
_________________________________________________________________
lambda (Lambda) (None, 201, 189, None) 0
_________________________________________________________________
lambda_1 (Lambda) (None, 201, 189, None) 0
_________________________________________________________________
mobilenetv2_1.00_224 (Functi (None, 7, 6, 1280) 2257984
_________________________________________________________________
flatten (Flatten) (None, 53760) 0
_________________________________________________________________
output (Dense) (None, 2) 107522
=================================================================
Total params: 2,365,506
Trainable params: 2,331,394
Non-trainable params: 34,112
So the MobileNetV2 structure is hidden and shown as one layer of type tensorflow.python.keras.engine.functional.Functional. If I print summary of this layer, I get all the internal layers of the model. I have a script for automatic GradCam visualizations which looks for the last Conv layer of the model. If the model is constructed by hand using Lambda, Conv2D, Dense layers, then everyhting works fine. If I use pretrained model, then currently it fails, because the Conv layer is hidden inside of this Functional layer.
How do I construct my modified MobileNetV2 model with my additional layers so that the full structure of the model is shown?
This is how I approximately construct my final model:
input = Input(shape=params.image_shape, name="input")
flow = input
flow = input_correction(flow, params) #some Lambda layers
keras_model = MobileNetV2(
input_shape=image_shape,
weights='imagenet',
include_top=False)
keras_model_output=keras_model(flow)
keras_model_input=input
keras_model_output = Flatten()(keras_model_output)
output = Dense(units=len(params.classes),
activation=tf.nn.softmax,
name="output")(keras_model_output)
model = Model(inputs=keras_model_input, outputs=output)
model.compile(...)
In default, summary doesnt show nested models. Just include expand_nested argument in the summary.
model.summary(expand_nested=True)

GradientTape returns None

I am trying to use grad-CAM (I'm following this https://www.pyimagesearch.com/2020/03/09/grad-cam-visualize-class-activation-maps-with-keras-tensorflow-and-deep-learning/ from PyImageSearch) on a CNN I'm using transfer learning on.
In particular, I am using a simple CNN for a regression problem. I used MobileNetV2 with an Average Pooling layer and a Dense layer with one unit on top, as shown below:
base_model = MobileNetV2(include_top=False, input_shape=(224, 224, 3), weights='imagenet')
base_model.trainable = False
inputs = keras.Input(shape=(224, 224, 3))
x = base_model(inputs)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(1, activation="linear")(x)
model = keras.Model(inputs, outputs)
and the summary is:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
mobilenetv2_1.00_224 (Model) (None, 7, 7, 1280) 2257984
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280) 0
_________________________________________________________________
dense (Dense) (None, 1) 1281
=================================================================
Total params: 2,259,265
Trainable params: 1,281
Non-trainable params: 2,257,984
_________________________________________________________________
I initialize the CAM object with:
pred = 0.35
cam = GradCAM(model, pred, layerName='input_2')
where pred is the predicted output on which I want to inspect the CAM and I also specify the layer name in order to refer to the input layer. Then I compute the heatmap on a sample image "img":
heatmap = cam.compute_heatmap(img)
Now, let's focus on a part of the implementation of the function compute_heatmap from PyImageSearch:
# record operations for automatic differentiation
with tf.GradientTape() as tape:
# cast the image tensor to a float-32 data type, pass the
# image through the gradient model, and grab the loss
# associated with the specific class index
inputs = tf.cast(image, tf.float32)
(convOutputs, predictions) = gradModel(inputs)
# loss = predictions[:, self.classIdx] # original from PyImageSearch
loss = predictions[:] # modified by me as I have only 1 output unit
# use automatic differentiation to compute the gradients
grads = tape.gradient(loss, convOutputs)
The problem here is that the gradient grads is None.
I thought that maybe the problem could lie in the network structure (all goes fine when reproducing the example of the classification task from the website), but I can't figure out where is the problem with this network used for regression!
Could you please help me?

CRF layer implementation with BiLSTM-CRF in TensorFlow 1.15

I implemented a bidirectional Long Short-Term Memrory Neural Network with a Conditional Random Field Layer (BiLSTM-CRF) using keras & keras_contrib (the latter for implementing the CRF, which is not part of native keras functionality. The task was Named Entity Recognition classification into one of 6 classes. The input to the network is a sequence of 300-dimensional pretrained GloVe word embeddings. This is my model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 648) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 648, 300) 1500000
_________________________________________________________________
bidirectional_1 (Bidirection (None, 648, 10000) 3204000
_________________________________________________________________
crf_1 (CRF) (None, 648, 6) 6054
=================================================================
Now I want to implement the same model in TensorFlow 1.15. Since the keras_contrib CRF module only works in keras but not TensorFlow, I used the CRF implementation built for TensorFlow 1.X from this repo. The repo includes two nice example implementations of the CRF here, but each produces a different error when trained on my data.
Implementation 1
from tensorflow.keras.layers import Bidirectional, Embedding, LSTM, TimeDistributed
from tensorflow.keras.models import Sequential
from tf_crf_layer.layer import CRF
from tf_crf_layer.loss import crf_loss
from tf_crf_layer.metrics import crf_accuracy
MAX_WORDS = 50000
EMBEDDING_LENGTH = 300
MAX_SEQUENCE_LENGTH = 648
HIDDEN_SIZE = 512
model = Sequential()
model.add(Embedding(MAX_WORDS, EMBEDDING_LENGTH, input_length=MAX_SEQUENCE_LENGTH, mask_zero=True, weights=[embedding_matrix], trainable=False))
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences=True)))
model.add(CRF(len(labels)))
model.compile('adam', loss=crf_loss, metrics=[crf_accuracy])
This is the error I get when I try to compile the model:
File "/.../tf_crf_layer/metrics/crf_accuracy.py", line 48, in crf_accuracy
crf, idx = y_pred._keras_history[:2]
AttributeError: 'Tensor' object has no attribute '_keras_history'
The error arises when computing crf_accuracy from the repo mentioned above.
def crf_accuracy(y_true, y_pred):
"""
Get default accuracy based on CRF `test_mode`.
"""
import pdb; pdb.set_trace()
crf, idx = y_pred._keras_history[:2]
if crf.test_mode == 'viterbi':
return crf_viterbi_accuracy(y_true, y_pred)
else:
return crf_marginal_accuracy(y_true, y_pred)
Apparently this kind of error happens when a tensor object is not the output of a keras layer, as per this thread. Why does this error surface here?
Implementation 2
from tf_crf_layer.layer import CRF
from tf_crf_layer.loss import crf_loss, ConditionalRandomFieldLoss
from tf_crf_layer.metrics import crf_accuracy
from tf_crf_layer.metrics.sequence_span_accuracy import SequenceSpanAccuracy
model = Sequential()
model.add(Embedding(MAX_WORDS, EMBEDDING_LENGTH, input_length=MAX_SEQUENCE_LENGTH, mask_zero=True, weights=[embedding_matrix], trainable=False))
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences=True)))
model.add(CRF(len(labels), name="crf_layer"))
model.summary()
crf_loss_instance = ConditionalRandomFieldLoss()
model.compile(loss={"crf_layer": crf_loss_instance}, optimizer='adam', metrics=[SequenceSpanAccuracy()])
Here the model compiles, but as soon as the first epoch of training begins, this error surfaces:
InvalidArgumentError: Expected begin and size arguments to be 1-D tensors of size 3, but got shapes [2] and [2] instead.
[[{{node loss_4/crf_layer_loss/Slice_1}}]]
I'm training the model using mini batches, could that explain the error? I also noticed that my model summary for the CRF layer lacks a dimension (compare the CRF layer specification in the summary above and in the summary below), although the number of parameters for that layer is the same as above. Why is causing this mismatch and how can it be fixed?
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_5 (Embedding) (None, 648, 300) 1500000
_________________________________________________________________
bidirectional_5 (Bidirection (None, 648, 1000) 3204000
_________________________________________________________________
crf_layer (CRF) (None, 648) 6054
=================================================================

Tensorflow keras Sequential .add is different than inline definition?

Keras is giving different results when I define my model via the declarative method instead of the functional method. The two models appear to be equivillent, but using the ".add()" syntax works while using the declarative syntax gives errors -- it's a different error each time, but usually something like:
A target array with shape (10, 1) was passed for an output of shape (None, 16) while using as loss `mean_squared_error`. This loss expects targets to have the same shape as the output.
There seems to be something going on with auto-conversion of input shapes, but I can't tell what. Does anyone know what I'm doing wrong? Why aren't these two models exactly equivillent?
import tensorflow as tf
import tensorflow.keras
import numpy as np
x = np.arange(10).reshape((-1,1,1))
y = np.arange(10)
#This model works fine
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(32, input_shape=(1, 1), return_sequences = True))
model.add(tf.keras.layers.LSTM(16))
model.add(tf.keras.layers.Dense(1))
model.add(tf.keras.layers.Activation('linear'))
#This model fails. But shouldn't this be equivalent to the above?
model2 = tf.keras.Sequential(
{
tf.keras.layers.LSTM(32, input_shape=(1, 1), return_sequences = True),
tf.keras.layers.LSTM(16),
tf.keras.layers.Dense(1),
tf.keras.layers.Activation('linear')
})
#This works
model.compile(loss='mean_squared_error', optimizer='adagrad')
model.fit(x, y, epochs=1, batch_size=1, verbose=2)
#But this doesn't! Why not? The error is different each time, but usually
#something about the input size being wrong
model2.compile(loss='mean_squared_error', optimizer='adagrad')
model2.fit(x, y, epochs=1, batch_size=1, verbose=2)
Why aren't those two models equivalent? Why does one handle the input size correctly but the other doesn't? The second model fails with a different error each time (once in a while it even works) so i thought maybe there's some interaction with the first model? But I've tried commenting out the first model and that doesn't help. So why doesn't the second one work?
UPDATE: Here is the "model.summary() for the first and second model. They do seem different but I don't understand why.
For model.summary():
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 1, 32) 4352
_________________________________________________________________
lstm_1 (LSTM) (None, 16) 3136
_________________________________________________________________
dense (Dense) (None, 1) 17
_________________________________________________________________
activation (Activation) (None, 1) 0
=================================================================
Total params: 7,505
Trainable params: 7,505
Non-trainable params: 0
For model2.summary():
model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 1, 32) 4352
_________________________________________________________________
activation_1 (Activation) (None, 1, 32) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 16) 3136
_________________________________________________________________
dense_1 (Dense) (None, 1) 17
=================================================================
Total params: 7,505
Trainable params: 7,505
Non-trainable params: 0```
When you are creating the model with the inline declarations, you put the layers in curly braces {}, which makes it a set, which is inherently unordered. Change the curly braces to square brackets [] to put them in an ordered list. This will make sure that the layers are in the correct order in your model.

Tensorflow with Keras: ValueError - expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)

I am trying to use Tensorflow through Keras to build a network that uses time-series data to predict the next value, but I'm getting this error:
ValueError: Error when checking target: expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)
What is causing this? I've tried reshaping the data as other posts have suggested, but to no avail so far. Here is the code:
import keras
import numpy as np
import os
from keras import losses
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Conv1D, Conv2D
# add the desktop to our path so we can access the data
os.path.join("C:\\Users\\user\\Desktop")
# import data
data = np.genfromtxt("C:\\Users\\user\\Desktop\\aapl_blocks_10.csv",
delimiter=',')
# separate into inputs and outputs
X = data[:, :9]
X = np.expand_dims(X, axis=2) # reshape (409, 9) to (409, 9, 1) for network
Y = data[:, 9]
# separate into test and train data
X_train = X[:100]
X_test = X[100:]
Y_train = Y[:100]
Y_test = Y[100:]
# set parameters
batch_size = 20;
# define model
model = Sequential()
model.add(Conv1D(filters=20,
kernel_size=5,
input_shape=(9, 1),
padding='causal'))
model.add(Flatten())
model.add(Dropout(rate=0.3))
model.add(Dense(units=10))
model.add(Activation('relu'))
model.add(Dense(units=1))
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
# train model
model.fit(X_train, Y_train, epochs=10, batch_size=batch_size)
# evaluate model
model.evaluate(X_test, Y_test, batch_size=batch_size)
And here is the model summary:
Layer (type) Output Shape Param #
=================================================================
conv1d_43 (Conv1D) (None, 9, 20) 120
_________________________________________________________________
flatten_31 (Flatten) (None, 180) 0
_________________________________________________________________
dropout_14 (Dropout) (None, 180) 0
_________________________________________________________________
dense_83 (Dense) (None, 10) 1810
_________________________________________________________________
activation_29 (Activation) (None, 10) 0
_________________________________________________________________
dense_84 (Dense) (None, 1) 11
=================================================================
Total params: 1,941
Trainable params: 1,941
Non-trainable params: 0
If there's a proper way to be formatting the data, or maybe a proper way to stack these layers, I would love to know.
I suspect you need to squeeze the channel dimension from the output, i.e. the labes are shape (batch_size, 9) and you're comparing that against the output of a dense layer with 1 channel which has size (batch_size, 9, 1). Solution: squeeze/flatten before calculating the loss.
...
model.add(Activation('relu'))
model.add(Dense(units=1))
model.add(Flatten())
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
A note on squeeze vs Flatten: in this case, the result of squeezing (removing an axis of dimension 1) and flattening (making something of shape (batch_size, n, m, ...) into shape (batch_size, nm...) will be the same. Squeeze might be slightly more appropriate in this case, since if you accidentally squeeze an axis without dimension 1 you'll get an error (a good thing), as opposed to having your program run with unexpected behaviour. I don't use keras much though and couldn't find a 'Squeeze' layer - just a squeeze function - and I'm not entirely sure how to integrate it.