CRF layer implementation with BiLSTM-CRF in TensorFlow 1.15 - tensorflow

I implemented a bidirectional Long Short-Term Memrory Neural Network with a Conditional Random Field Layer (BiLSTM-CRF) using keras & keras_contrib (the latter for implementing the CRF, which is not part of native keras functionality. The task was Named Entity Recognition classification into one of 6 classes. The input to the network is a sequence of 300-dimensional pretrained GloVe word embeddings. This is my model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 648) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 648, 300) 1500000
_________________________________________________________________
bidirectional_1 (Bidirection (None, 648, 10000) 3204000
_________________________________________________________________
crf_1 (CRF) (None, 648, 6) 6054
=================================================================
Now I want to implement the same model in TensorFlow 1.15. Since the keras_contrib CRF module only works in keras but not TensorFlow, I used the CRF implementation built for TensorFlow 1.X from this repo. The repo includes two nice example implementations of the CRF here, but each produces a different error when trained on my data.
Implementation 1
from tensorflow.keras.layers import Bidirectional, Embedding, LSTM, TimeDistributed
from tensorflow.keras.models import Sequential
from tf_crf_layer.layer import CRF
from tf_crf_layer.loss import crf_loss
from tf_crf_layer.metrics import crf_accuracy
MAX_WORDS = 50000
EMBEDDING_LENGTH = 300
MAX_SEQUENCE_LENGTH = 648
HIDDEN_SIZE = 512
model = Sequential()
model.add(Embedding(MAX_WORDS, EMBEDDING_LENGTH, input_length=MAX_SEQUENCE_LENGTH, mask_zero=True, weights=[embedding_matrix], trainable=False))
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences=True)))
model.add(CRF(len(labels)))
model.compile('adam', loss=crf_loss, metrics=[crf_accuracy])
This is the error I get when I try to compile the model:
File "/.../tf_crf_layer/metrics/crf_accuracy.py", line 48, in crf_accuracy
crf, idx = y_pred._keras_history[:2]
AttributeError: 'Tensor' object has no attribute '_keras_history'
The error arises when computing crf_accuracy from the repo mentioned above.
def crf_accuracy(y_true, y_pred):
"""
Get default accuracy based on CRF `test_mode`.
"""
import pdb; pdb.set_trace()
crf, idx = y_pred._keras_history[:2]
if crf.test_mode == 'viterbi':
return crf_viterbi_accuracy(y_true, y_pred)
else:
return crf_marginal_accuracy(y_true, y_pred)
Apparently this kind of error happens when a tensor object is not the output of a keras layer, as per this thread. Why does this error surface here?
Implementation 2
from tf_crf_layer.layer import CRF
from tf_crf_layer.loss import crf_loss, ConditionalRandomFieldLoss
from tf_crf_layer.metrics import crf_accuracy
from tf_crf_layer.metrics.sequence_span_accuracy import SequenceSpanAccuracy
model = Sequential()
model.add(Embedding(MAX_WORDS, EMBEDDING_LENGTH, input_length=MAX_SEQUENCE_LENGTH, mask_zero=True, weights=[embedding_matrix], trainable=False))
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences=True)))
model.add(CRF(len(labels), name="crf_layer"))
model.summary()
crf_loss_instance = ConditionalRandomFieldLoss()
model.compile(loss={"crf_layer": crf_loss_instance}, optimizer='adam', metrics=[SequenceSpanAccuracy()])
Here the model compiles, but as soon as the first epoch of training begins, this error surfaces:
InvalidArgumentError: Expected begin and size arguments to be 1-D tensors of size 3, but got shapes [2] and [2] instead.
[[{{node loss_4/crf_layer_loss/Slice_1}}]]
I'm training the model using mini batches, could that explain the error? I also noticed that my model summary for the CRF layer lacks a dimension (compare the CRF layer specification in the summary above and in the summary below), although the number of parameters for that layer is the same as above. Why is causing this mismatch and how can it be fixed?
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_5 (Embedding) (None, 648, 300) 1500000
_________________________________________________________________
bidirectional_5 (Bidirection (None, 648, 1000) 3204000
_________________________________________________________________
crf_layer (CRF) (None, 648) 6054
=================================================================

Related

Summary of models constructed for transfer learning in tensorflow keras

I'm using tensorflow 2.6 keras for transfer learning. Currently I take MobileNetV2. I take input, apply some preprocessing using Lambda layer, then feed this preprocessed input to MobileNetV2, then add Dense layer and train this thing. Training, inference etc actually work as expected.
However, the summary of the model looks as follows:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, 201, 189, 1)] 0
_________________________________________________________________
lambda (Lambda) (None, 201, 189, None) 0
_________________________________________________________________
lambda_1 (Lambda) (None, 201, 189, None) 0
_________________________________________________________________
mobilenetv2_1.00_224 (Functi (None, 7, 6, 1280) 2257984
_________________________________________________________________
flatten (Flatten) (None, 53760) 0
_________________________________________________________________
output (Dense) (None, 2) 107522
=================================================================
Total params: 2,365,506
Trainable params: 2,331,394
Non-trainable params: 34,112
So the MobileNetV2 structure is hidden and shown as one layer of type tensorflow.python.keras.engine.functional.Functional. If I print summary of this layer, I get all the internal layers of the model. I have a script for automatic GradCam visualizations which looks for the last Conv layer of the model. If the model is constructed by hand using Lambda, Conv2D, Dense layers, then everyhting works fine. If I use pretrained model, then currently it fails, because the Conv layer is hidden inside of this Functional layer.
How do I construct my modified MobileNetV2 model with my additional layers so that the full structure of the model is shown?
This is how I approximately construct my final model:
input = Input(shape=params.image_shape, name="input")
flow = input
flow = input_correction(flow, params) #some Lambda layers
keras_model = MobileNetV2(
input_shape=image_shape,
weights='imagenet',
include_top=False)
keras_model_output=keras_model(flow)
keras_model_input=input
keras_model_output = Flatten()(keras_model_output)
output = Dense(units=len(params.classes),
activation=tf.nn.softmax,
name="output")(keras_model_output)
model = Model(inputs=keras_model_input, outputs=output)
model.compile(...)
In default, summary doesnt show nested models. Just include expand_nested argument in the summary.
model.summary(expand_nested=True)

Issue in removing layer from a pretrained model

I have the following code, I need to remove some layers of the model and perform prediction. But currently I am retrieving error.
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
from keras.models import Model
from tensorflow.python.keras.optimizers import SGD
base_model = ResNet50(include_top=False, weights='imagenet')
model= Model(inputs=base_model.input, outputs=base_model .layers[-2].output)
#model = Model(inputs=base_model.input, outputs=predictions)
#Compiling the model
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics =
['accuracy'])
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
#decode the results into a list of tuples (class, description, probability)
#(one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
error
File "C:/Users/learn/remove_layer.py", line 9, in <module>
model= Model(inputs=base_model.input, outputs=base_model .layers[-2].output)
AttributeError: 'Tensor' object has no attribute '_keras_shape'
Due to my beginner's knowledge in Keras what I understood is the shape issue. Since its a resnet model, if I remove a layer from one merge to another merge layer, because merge layer doesn't have dimension issues, how can I accomplish this?
You actually need to visualize what you have done, so lets do little summary for last layers of ResNet50 Model:
base_model.summary()
conv5_block3_2_relu (Activation (None, None, None, 5 0 conv5_block3_2_bn[0][0]
__________________________________________________________________________________________________
conv5_block3_3_conv (Conv2D) (None, None, None, 2 1050624 conv5_block3_2_relu[0][0]
__________________________________________________________________________________________________
conv5_block3_3_bn (BatchNormali (None, None, None, 2 8192 conv5_block3_3_conv[0][0]
__________________________________________________________________________________________________
conv5_block3_add (Add) (None, None, None, 2 0 conv5_block2_out[0][0]
conv5_block3_3_bn[0][0]
__________________________________________________________________________________________________
conv5_block3_out (Activation) (None, None, None, 2 0 conv5_block3_add[0][0]
==================================================================================================
Total params: 23,587,712
Trainable params: 23,534,592
Non-trainable params: 53,120
_____________________________
And now your model after removing last layer
model.summary()
conv5_block3_2_relu (Activation (None, None, None, 5 0 conv5_block3_2_bn[0][0]
__________________________________________________________________________________________________
conv5_block3_3_conv (Conv2D) (None, None, None, 2 1050624 conv5_block3_2_relu[0][0]
__________________________________________________________________________________________________
conv5_block3_3_bn (BatchNormali (None, None, None, 2 8192 conv5_block3_3_conv[0][0]
__________________________________________________________________________________________________
conv5_block3_add (Add) (None, None, None, 2 0 conv5_block2_out[0][0]
conv5_block3_3_bn[0][0]
==================================================================================================
Total params: 23,587,712
Trainable params: 23,534,592
Non-trainable params: 53,120
Reset50 in keras output is all the feature map after the last Conv2D blocks it doesn't care about the classfication part of your model, what you actualy did is that you just removed the last activation layer after the last addition block
So you need check more which block layer you wanna remove and add flatten and fully connected layer for the classfication part
Also as mentioned by Dr.Snoopy, dont mix imports between keras and tensorflow.keras
# this part
from tensorflow.keras.models import Model

how to save custom trained model without full connect layer just like MobileNetV2 include_top=False

i want to save my trained model to .h5 without last two layers, in order to transfer learning using my custom model in the furture, just like MobileNetV2 include_top=False, can someone help me, thanks!
base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(
alpha=1.0,
input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(255, activation=tf.nn.softmax)
])
trained model like this:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mobilenetv2_1.00_224 (Model) (None, 2, 2, 1280) 2257984
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280) 0
_________________________________________________________________
dense (Dense) (None, 205) 262605
=================================================================
Total params: 2,520,589
Trainable params: 2,486,477
Non-trainable params: 34,112
_________________________________________________________________
when i try to using it for transfer learning
keras_model = loadModel(keras_model_path)
keras_model.summary()
input = keras_model.input
hidden = tf.keras.layers.GlobalMaxPooling2D()(keras_model.layers[-3].output)
out = tf.keras.layers.Dense(128, activation=tf.nn.softmax)(hidden)
model2 = tf.keras.Model(input, out)
model2.summary()
an error occurs
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, 64, 64, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
i want to save my trained model to .h5 without last two layers,
why don't you save the full model with model.save() and when you reload it for transfer learning, just remove the layers using:
model.layers.pop()
You can also remove the layers before saving the model but I wouldn't do that

Keras, calculating gradients of the loss wrt the input on an LSTM

I am quite new to machine learning and I was messing around with adversarial-examples.
I am trying to fool a binary character-level LSTM text classifier.
Thus I need the gradient of the loss w.r.t. the input.
The gradients function although returns None.
I already tried to get the gradients, like in this post
or this post, but the gradients function still returns None.
EDIT: I wanted to do something similar than in this git repo.
I was thinking that the problem might be that it was an LSTM classifier.
I am not sure at this point. But I think that it should be possible to get these gradients even from an LSTM classifier right?
Here is my code:
import numpy as np
from keras.preprocessing import sequence
from keras.models import load_model
import data
import pickle
import keras.backend as K
def adversary():
model, valid_chars = loadModel()
model.summary()
#load data
X, y, maxlen, _ , max_features, indata = prepare_data(valid_chars)
target = y[0]
# Get the loss and gradient of the loss wrt the inputs
target = np.asarray(target).astype('float32').reshape((-1,1))
loss = K.binary_crossentropy(target, model.output)
print(target)
print(model.output)
print(model.input)
print(loss)
grads = K.gradients(loss, model.input)
#f = K.function([model.input], [loss, grads])
#print(f(X[1:2]))
print(model.predict(X[0:1]))
print(grads)
The output looks like this:
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 74, 128) 5120
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 131584
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 129
_________________________________________________________________
activation_1 (Activation) (None, 1) 0
=================================================================
Total params: 136,833
Trainable params: 136,833
Non-trainable params: 0
_________________________________________________________________
Maxlen: 74
Data preparing finished
[[0.]]
Tensor("activation_1/Sigmoid:0", shape=(?, 1), dtype=float32)
Tensor("embedding_1_input:0", shape=(?, 74), dtype=float32)
Tensor("logistic_loss_1:0", shape=(?, 1), dtype=float32)
[[1.1397913e-13]]
[None]
I was hoping to get the gradients of the loss w.r.t. the input data to see which of the characters has the most impact on the output.
Thus I could fool the classifier by modifying the respective characters.
Is this possible? If yes, what is wrong with my approach?
Thank you for your time.
Gradients can only be computed for "trainable" tensors, so you might want to wrap your input into tf.Variable().
As soon as you want to work with gradient, I would suggest doing it using tensorflow, which nicely integrates with Keras. Below is my example of doing it, note that it works in eager execution mode (default in tensorflow 2.0).
def train_actor(self, sars):
obs1, actions, rewards, obs2 = sars
with tf.GradientTape() as tape:
would_do_actions = self.compute_actions(obs1)
score = tf.reduce_mean(self.critic(observations=obs1, actions=would_do_actions))
loss = - score
grads = tape.gradient(loss, self.actor.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.actor.trainable_weights))
I just found this thread.
The gradients function returns None because the embedding layer in not differentiable.
The embedding layer is implemented as K.gather which is not differentiable, so there is no gradient.

Tensorflow with Keras: ValueError - expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)

I am trying to use Tensorflow through Keras to build a network that uses time-series data to predict the next value, but I'm getting this error:
ValueError: Error when checking target: expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)
What is causing this? I've tried reshaping the data as other posts have suggested, but to no avail so far. Here is the code:
import keras
import numpy as np
import os
from keras import losses
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Conv1D, Conv2D
# add the desktop to our path so we can access the data
os.path.join("C:\\Users\\user\\Desktop")
# import data
data = np.genfromtxt("C:\\Users\\user\\Desktop\\aapl_blocks_10.csv",
delimiter=',')
# separate into inputs and outputs
X = data[:, :9]
X = np.expand_dims(X, axis=2) # reshape (409, 9) to (409, 9, 1) for network
Y = data[:, 9]
# separate into test and train data
X_train = X[:100]
X_test = X[100:]
Y_train = Y[:100]
Y_test = Y[100:]
# set parameters
batch_size = 20;
# define model
model = Sequential()
model.add(Conv1D(filters=20,
kernel_size=5,
input_shape=(9, 1),
padding='causal'))
model.add(Flatten())
model.add(Dropout(rate=0.3))
model.add(Dense(units=10))
model.add(Activation('relu'))
model.add(Dense(units=1))
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
# train model
model.fit(X_train, Y_train, epochs=10, batch_size=batch_size)
# evaluate model
model.evaluate(X_test, Y_test, batch_size=batch_size)
And here is the model summary:
Layer (type) Output Shape Param #
=================================================================
conv1d_43 (Conv1D) (None, 9, 20) 120
_________________________________________________________________
flatten_31 (Flatten) (None, 180) 0
_________________________________________________________________
dropout_14 (Dropout) (None, 180) 0
_________________________________________________________________
dense_83 (Dense) (None, 10) 1810
_________________________________________________________________
activation_29 (Activation) (None, 10) 0
_________________________________________________________________
dense_84 (Dense) (None, 1) 11
=================================================================
Total params: 1,941
Trainable params: 1,941
Non-trainable params: 0
If there's a proper way to be formatting the data, or maybe a proper way to stack these layers, I would love to know.
I suspect you need to squeeze the channel dimension from the output, i.e. the labes are shape (batch_size, 9) and you're comparing that against the output of a dense layer with 1 channel which has size (batch_size, 9, 1). Solution: squeeze/flatten before calculating the loss.
...
model.add(Activation('relu'))
model.add(Dense(units=1))
model.add(Flatten())
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
A note on squeeze vs Flatten: in this case, the result of squeezing (removing an axis of dimension 1) and flattening (making something of shape (batch_size, n, m, ...) into shape (batch_size, nm...) will be the same. Squeeze might be slightly more appropriate in this case, since if you accidentally squeeze an axis without dimension 1 you'll get an error (a good thing), as opposed to having your program run with unexpected behaviour. I don't use keras much though and couldn't find a 'Squeeze' layer - just a squeeze function - and I'm not entirely sure how to integrate it.