Shared weight matrices between dense layers in keras - tensorflow

I am trying to reimplement this tensorflow code into keras, I have noted other tickets submitted here that do not share the sentiment I am trying to recreate. The goal is to share a weight matrix across multiple dense layers.
import tensorflow as tf
# define input and weight matrices
x = tf.placeholder(shape=[None, 4], dtype=tf.float32)
w1 = tf.Variable(tf.truncated_normal(stddev=.1, shape=[4, 12]),
dtype=tf.float32)
w2 = tf.Variable(tf.truncated_normal(stddev=.1, shape=[12, 2]),
dtype=tf.float32)
# neural network
hidden_1 = tf.nn.tanh(tf.matmul(x, w1))
projection = tf.matmul(hidden_1, w2)
hidden_2 = tf.nn.tanh(projection)
hidden_3 = tf.nn.tanh(tf.matmul(hidden_2, tf.transpose(w2)))
y = tf.matmul(hidden_3, tf.transpose(w1))
# loss function and optimizer
loss = tf.reduce_mean(tf.reduce_sum((x - y) * (x - y), 1))
optimize = tf.train.AdamOptimizer().minimize(loss)
init = tf.initialize_all_variables()
The issue is reimplementing these weight layers in keras as the transpose of original layers. I am currently implementing my own network using keras functional API

Start by defining your two dense layers:
from keras.layers import Dense, Lambda
import keras.backend as K
dense1 = Dense(12, use_bias=False, activation='tanh')
dense2 = Dense(2, use_bias=False, activation='tanh')
You can then access the weights from your layers with for example dense1.weights[0]. You can wrap this in a lambda layer that also transposes your weights:
h3 = Lambda(lambda x: K.dot(x, K.transpose(dense2.weights[0])))(h2)

Related

Why this model can't overfit one example?

I am practicing conv1D on TensorFlow 2.7, and I am checking a decoder I developed by checking if it will overfit one example. The model doesn't learn when trained on only one example and can't overfit this one example. I want to understand this strange behavior, please. This is the link to the notebook on colab Notebook.
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv1D, Dense, BatchNormalization
from tensorflow.keras.layers import ReLU, MaxPool1D, GlobalMaxPool1D
from tensorflow.keras import Model
import numpy as np
def Decoder():
inputs = Input(shape=(68, 3), name='Input_Tensor')
# First hidden layer
conv1 = Conv1D(filters=64, kernel_size=1, name='Conv1D_1')(inputs)
bn1 = BatchNormalization(name='BN_1')(conv1)
relu1 = ReLU(name='ReLU_1')(bn1)
# Second hidden layer
conv2 = Conv1D(filters=64, kernel_size=1, name='Conv1D_2')(relu1)
bn2 = BatchNormalization(name='BN_2')(conv2)
relu2 = ReLU(name='ReLU_2')(bn2)
# Third hidden layer
conv3 = Conv1D(filters=64, kernel_size=1, name='Conv1D_3')(relu2)
bn3 = BatchNormalization(name='BN_3')(conv3)
relu3 = ReLU(name='ReLU_3')(bn3)
# Fourth hidden layer
conv4 = Conv1D(filters=128, kernel_size=1, name='Conv1D_4')(relu3)
bn4 = BatchNormalization(name='BN_4')(conv4)
relu4 = ReLU(name='ReLU_4')(bn4)
# Fifth hidden layer
conv5 = Conv1D(filters=1024, kernel_size=1, name='Conv1D_5')(relu4)
bn5 = BatchNormalization(name='BN_5')(conv5)
relu5 = ReLU(name='ReLU_5')(bn5)
global_features = GlobalMaxPool1D(name='GlobalMaxPool1D')(relu5)
global_features = tf.keras.layers.Reshape((1, -1))(global_features)
conv6 = Conv1D(filters=12, kernel_size=1, name='Conv1D_6')(global_features)
bn6 = BatchNormalization(name='BN_6')(conv6)
outputs = ReLU(name='ReLU_6')(bn6)
model = Model(inputs=[inputs], outputs=[outputs], name='Decoder')
return model
model = Decoder()
model.summary()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
losses = tf.keras.losses.MeanSquaredError()
model.compile(optimizer=optimizer, loss=losses)
n = 1
X = np.random.rand(n, 68, 3)
y = np.random.rand(n, 1, 12)
model.fit(x=X,y=y, verbose=1, epochs=30)
I think the problem here is, that you have no basis to learn anything, so you can't overfit. In every epoch you have just one example which is used to adapt the weights of the network. So there is not enough time to adapt the weights for overfitting here.
So to get the result of overfitting you want to have the same data multiple times inside your training dataset so the weights can change enought to overfitt because you only change them just one small step per epoch.
A deeper look into the back propagation might help you to get a better understanding of the concept. Click
I took th liberty to adapt your notebook and enhanced the dataset as following:
n = 1
X = np.random.rand(n, 68, 3)
y = np.random.rand(n, 1, 12)
for i in range(0,10):
X=np.append(X,X,axis = 0)
y=np.append(y,y,axis = 0)
And the output would be:

How to apply Triplet Loss for a ResNet50 based Siamese Network in Keras or Tf 2

I have a ResNet based siamese network which uses the idea that you try to minimize the l-2 distance between 2 images and then apply a sigmoid so that it gives you {0:'same',1:'different'} output and based on how far the prediction is, you just flow the gradients back to network but there is a problem that updation of gradients is too little as we're changing the distance between {0,1} so I thought of using the same architecture but based on Triplet Loss.
I1 = Input(shape=image_shape)
I2 = Input(shape=image_shape)
res_m_1 = ResNet50(include_top=False, weights='imagenet', input_tensor=I1, pooling='avg')
res_m_2 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x1 = res_m_1.output
x2 = res_m_2.output
# x = Flatten()(x) or use this one if not using any pooling layer
distance = Lambda( lambda tensors : K.abs( tensors[0] - tensors[1] )) ([x1,x2] )
final_output = Dense(1,activation='sigmoid')(distance)
siamese_model = Model(inputs=[I1,I2], outputs=final_output)
siamese_model.compile(loss='binary_crossentropy',optimizer=Adam(),metrics['acc'])
siamese_model.fit_generator(train_gen,steps_per_epoch=1000,epochs=10,validation_data=validation_data)
So how can I change it to use the Triplet Loss function? What adjustments should be done here in order to get this done? One change will be that I'll have to calculate
res_m_3 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x3 = res_m_3.output
One thing found in tf docs is triplet-semi-hard-loss and is given as:
tfa.losses.TripletSemiHardLoss()
As shown in the paper, the best results are from triplets known as "Semi-Hard". These are defined as triplets where the negative is farther from the anchor than the positive, but still produces a positive loss. To efficiently find these triplets we utilize online learning and only train from the Semi-Hard examples in each batch.
Another implementation of Triplet Loss which I found on Kaggle is: Triplet Loss Keras
Which one should I use and most importantly, HOW?
P.S: People also use something like: x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x) after model.output. Why is that? What is this doing?
Following this answer of mine, and with role of TripletSemiHardLoss in mind, we could do following:
import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_datasets as tfds
from tensorflow.keras import models, layers
BATCH_SIZE = 32
LATENT_DEM = 128
def _normalize_img(img, label):
img = tf.cast(img, tf.float32) / 255.
return (img, label)
train_dataset, test_dataset = tfds.load(name="mnist", split=['train', 'test'], as_supervised=True)
# Build your input pipelines
train_dataset = train_dataset.shuffle(1024).batch(BATCH_SIZE)
train_dataset = train_dataset.map(_normalize_img)
test_dataset = test_dataset.batch(BATCH_SIZE)
test_dataset = test_dataset.map(_normalize_img)
inputs = layers.Input(shape=(28, 28, 1))
resNet50 = tf.keras.applications.ResNet50(include_top=False, weights=None, input_tensor=inputs, pooling='avg')
outputs = layers.Dense(LATENT_DEM, activation=None)(resNet50.output) # No activation on final dense layer
outputs = layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1))(outputs) # L2 normalize embedding
siamese_model = models.Model(inputs=inputs, outputs=outputs)
# Compile the model
siamese_model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tfa.losses.TripletSemiHardLoss())
# Train the network
history = siamese_model.fit(
train_dataset,
epochs=3)

Get gradients with respect to inputs in Keras ANN model

bce = tf.keras.losses.BinaryCrossentropy()
ll=bce(y_test[0], model.predict(X_test[0].reshape(1,-1)))
print(ll)
<tf.Tensor: shape=(), dtype=float32, numpy=0.04165391>
print(model.input)
<tf.Tensor 'dense_1_input:0' shape=(None, 195) dtype=float32>
model.output
<tf.Tensor 'dense_3/Sigmoid:0' shape=(None, 1) dtype=float32>
grads=K.gradients(ll, model.input)[0]
print(grads)
None
So here i have Trained a 2 hidden layer neural network, input has 195 features and output is 1 size. I wanted to feed the neural network with validation instances named as X_test one by one with their correct labels in y_test and for each instance calculate the gradients of the output with respect to input, the grads upon printing gives me a None. Your help is appreciated.
One can do this using tf.GradientTape. I wrote the following code to learn a sin wave, and get its derivative in the spirit of this question. I think, it should be possible to extend the following codes in order to compute partial derivatives.
Importing the needed libraries:
import numpy as np
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses
import tensorflow as tf
Create the data:
x = np.linspace(0, 6*np.pi, 2000)
y = np.sin(x)
Defining a Keras NN:
def model_gen(Input_shape):
X_input = Input(shape=Input_shape)
X = Dense(units=64, activation='sigmoid')(X_input)
X = Dense(units=64, activation='sigmoid')(X)
X = Dense(units=1)(X)
model = Model(inputs=X_input, outputs=X)
return model
Training the model:
model = model_gen(Input_shape=(1,))
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.001)
model.compile(loss=losses.mean_squared_error, optimizer=opt)
model.fit(x,y, epochs=200)
To obtain the gradient of the network w.r.t. the input:
x = list(x)
x = tf.constant(x)
with tf.GradientTape() as t:
t.watch(x)
y = model(x)
dy_dx = t.gradient(y, x)
dy_dx.numpy()
One can further visualise dy_dx to make sure of how smooth the derivative is. Finally, note that one get a smoother derivative when one uses a smooth activation (e.g. sigmoid) instead of Relu as noted here.

Customized loss in tensorflow with keras

OS Platform and Distribution: Linux Ubuntu16.04
; TensorFlow version : '1.4.0'
I can run properly with the following code:
import tensorflow as tf
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.backend import categorical_crossentropy
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.layers import Input
mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)
img_size_flat = 28*28
batch_size = 64
def gen(batch_size=32):
while True:
batch_data, batch_label = mnist_data.train.next_batch(batch_size)
yield batch_data, batch_label
inputs = Input(shape=(img_size_flat,))
x = Dense(128, activation='relu')(inputs) # fully-connected layer with 128 units and ReLU activation
x = Dense(128, activation='relu')(x)
preds = Dense(10, activation='softmax')(x) # output layer with 10 units and a softmax activation
model = Model(inputs=inputs, outputs=preds)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit_generator(gen(batch_size), steps_per_epoch=len(mnist_data.train.labels)//batch_size, epochs=2)
But if I want to write loss function with my own code like:
preds_softmax = tf.nn.softmax(preds)
step1 = tf.cast(y_true, tf.float32) * tf.log(preds_softmax)
step2 = -tf.reduce_sum(step1, reduction_indices=[1])
loss = tf.reduce_mean(step2) # loss
Can I using customized loss function and train it based on keras's model.fit_generator?
Is something like the following code on tensorflow?
inputs = tf.placeholder(tf.float32, shape=(None, 784))
x = Dense(128, activation='relu')(inputs) # fully-connected layer with 128 units and ReLU activation
x = Dense(128, activation='relu')(x)
preds = Dense(10, activation='softmax')(x) # output layer with 10 units and a softmax activation
y_true = tf.placeholder(tf.float32, shape=(None, 10))
How can I do based on above code(part I)? Thanks for any help!!
Just wrap your loss into a function, and provide it to model.compile.
def custom_loss(y_true, y_pred):
preds_softmax = tf.nn.softmax(y_pred)
step1 = y_true * tf.log(preds_softmax)
return -tf.reduce_sum(step1, reduction_indices=[1])
model.compile(optimizer='rmsprop',
loss=custom_loss,
metrics=['accuracy'])
Also note that,
you don't need to cast y_true into float32. It is done automatically by Keras.
you don't need to take the final reduce_mean. Keras will also take care of that.

How do I call model.predict with data stored in GPU in Keras

I have built and trained a Sequential model.
Now before each model.predict call I want to upload the data into GPU, do some operations and then call model.predict using the output stored in GPU without downloading to memory and handover to keras model for it to upload to gpu again.
Edit:
I would like to use opencv operations on the input image in gpu and use the output directly to call model.predict if possible.
You can easily achieve this by adding the operations as Lambda layers on the top of the model.
Here is a very simple example.. you can extend from here:
import numpy as np
from keras import backend as K
from keras.models import Sequential, Model
from keras.layers import Dense, Lambda, Input, merge
X = np.random.random((1000,5))
Y = np.random.random((1000,1))
inp = Input(shape = (5,))
d1 = Dense(60, input_dim=5, init='normal', activation='relu')
d2 = Dense(1, init='normal', activation='sigmoid')
out = d2(d1(inp))
model = Model(input=[inp], output=[out])
model.compile(loss='binary_crossentropy', optimizer='adam')
model.summary()
model.fit(X, Y, nb_epoch=1)
X1 = np.random.random((10,3))
X2 = np.random.random((10,2))
inp1 = Input(shape = (3,))
inp2 = Input(shape = (2,))
p1 = Lambda(lambda x: K.sqrt(x))(inp1)
p2 = Lambda(lambda x: K.tf.exp(x))(inp2)
mer = merge([p1, p2], mode='concat')
out2 = d2(d1(mer))
model2 = Model(input=[inp1, inp2], output=[out2])
model2.summary()
ypred = model2.predict([X1, X2])
print ypred.shape
Here from model.summary() you can see both the models are sharing the upper layers so in essence is using the weights already learnt during the training of the first model