I have a multi-input convolutional neural network model that inputs 2 images from 2 datasets to give one output which is the class of the two inputs. The two datasets have the same classes. I used 2 vgg16 models and concatenate them to classify the two images.
vgg16_model = keras.applications.vgg16.VGG16()
input_layer1= vgg16_model .input
last_layer1 = vgg16_model.get_layer('fc2').output
vgg16_model2 = keras.applications.vgg16.VGG16()
input_layer2= vgg16_model .input
last_layer2 = vgg16_model.get_layer('fc2').output
con = concatenate([last_layer1, last_layer2]) # merge the outputs of the two models
output_layer = Dense(no_classes, activation='softmax', name='prediction')(con)
multimodal_model1 = Model(inputs=[input_layer1, input_layer2], outputs=[output_layer])
My questions are:
1- Which case from the following represents how the images enter to the model?
One to One
database1-img1 + database2-img1
database1-img2 + database2-img2
database1-img3 + database2-img3
database1-img4 + database2-img4
.........
Many to many
database1-img1 + database2-img1
database1-img1 + database2-img2
database1-img1 + database2-img3
database1-img1 + database2-img4
database1-img2 + database2-img1
database1-img2 + database2-img2
database1-img2 + database2-img3
database1-img2 + database2-img4
.........
2- In general in deep learning, Does the images enter from the two datasets to the model at the same time have the same class (labels) or not?
It is a 1:1 mapping, the same should be with multiple outputs as well.
When you have a model such as Model(inputs=[input_layer1, input_layer2], outputs=[output_layer]) or even Model(inputs=[input_layer1, input_layer2], outputs=[output_layer1, output_layer2]) , You must feed it with inputs / output of the same shape.
Assume the other case - You will need to have ds1.shape[0] * ds2.shape[0] different labels, for each possible mix of the 2 datasets, and will need to have them ordered at a certain way. That is not really feasible, at least not simply.
2. Its not as if the same images have the same label, but the Pair of both images have a single label.
Related
There is something about the workings of GradientTape that escapes my understanding.
Suppose we want to train an agent on the classic bandit problem using an actor-critic RL framework. There are two bandits, A and B, and the agent must learn to select A, which yields higher returns on average. The training consists of, say, 1000 epochs, in each of which the agent draws, say, 100 samples from each bandit. The reward is 1 every time the agent selects A, and 0 otherwise.
Let's see how the agent learns by observing rewards over 10 training simulations. Here is the code defining the agent and the environment (neither needs to be more complicated than below).
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from keras import Model
from keras.optimizers import Adam
n_sims = 10 # number of simulations
for n in range(n_sims):
# define actors and optimizers for each simulation
actor_input = Input(shape=(2,))
actor_output = Dense(2, activation='softmax')(actor_input)
globals()[f'actor_{n}'] = Model(inputs=actor_input, outputs=actor_output)
globals()[f'actor_opt_{n}'] = Adam(learning_rate=.1)
# define critics and optimizers for each simulation
critic_input = Input(shape=(2,))
critic_output = Dense(1, activation='softmax')(critic_input)
globals()[f'critic_{n}'] = Model(inputs=critic_input, outputs=critic_output)
globals()[f'critic_opt_{n}'] = Adam(learning_rate=.1)
globals()[f'mean_rewards_{n}'] = [] # list to store rewards over training epochs for each simulation
A = np.random.normal(loc=10, scale=15, size=int(1e5)) # bandit A
B = np.random.normal(loc=0, scale=1, size=int(1e5)) # bandit B
n_training_epochs = 1000
n_samples = 100
Let's consider two alternative codes for the training loop using GradientTape, both based on a simple 'vanilla' loss function.
The first is the slow one and literally involves a for loop over the samples drawn in each epoch. Cumulative actor and critic's losses are iteratively computed, and then their means are used to update their respective network weights.
for _ in range(n_training_epochs):
A_samples = np.random.choice(A, size=n_samples)
B_samples = np.random.choice(B, size=n_samples)
for n in range(n_sims):
cum_actor_loss, cum_critic_loss, cum_reward = 0, 0, 0
with tf.GradientTape() as actor_tape, tf.GradientTape() as critic_tape:
for A_sample, B_sample in zip(A_samples, B_samples):
probs = globals()[f'actor_{n}'](tf.reshape([A_sample, B_sample], (1,-1)))[0]
action = np.random.choice(['A','B'], p=np.squeeze(probs))
reward = 1 if action == 'A' else 0
cum_reward += reward
action_prob = probs[['A','B'].index(action)]
value = globals()[f'critic_{n}'](tf.reshape([A_sample, B_sample], (1,-1)))[0]
advantage = reward - value
cum_actor_loss += -tf.math.log(action_prob)*advantage
cum_critic_loss += advantage**2
mean_actor_loss = cum_actor_loss/n_samples
mean_critic_loss = cum_critic_loss/n_samples
globals()[f'mean_rewards_{n}'].append(cum_reward/n_samples)
actor_grads = actor_tape.gradient(mean_actor_loss, globals()[f'actor_{n}'].trainable_variables)
globals()[f'actor_opt_{n}'].apply_gradients(zip(actor_grads, globals()[f'actor_{n}'].trainable_variables))
critic_grads = critic_tape.gradient(mean_critic_loss, globals()[f'critic_{n}'].trainable_variables)
globals()[f'critic_opt_{n}'].apply_gradients(zip(critic_grads, globals()[f'critic_{n}'].trainable_variables))
If you plot the average training rewards over each epoch, you'll probably get something like this figure
In the second option, instead of using an explicit for loop over samples in each epoch, we perform operations on arrays. This alternative is much faster in terms of computation time.
for _ in range(n_training_epochs):
A_samples = np.random.choice(A, size=n_samples)
B_samples = np.random.choice(B, size=n_samples)
for n in range(n_sims):
with tf.GradientTape() as actor_tape, tf.GradientTape() as critic_tape:
probs = globals()[f'actor_{n}'](tf.reshape([[A_sample, B_sample] for A_sample, B_sample in zip(A_samples, B_samples)], (n_samples,-1)))
actions = np.array([np.random.choice(['A','B'], p=np.squeeze(probs[i])) for i in range(len(probs))]).reshape(n_samples, -1)
rewards = np.array([1.0 if action == 'A' else 0.0 for action in actions]).reshape(n_samples, -1)
globals()[f'mean_rewards_{n}'].append(np.mean(rewards))
values = globals()[f'critic_{n}'](tf.reshape([[A_sample, B_sample] for A_sample, B_sample in zip(A_samples, B_samples)], (n_samples,-1)))
advantages = rewards + tf.math.negative(values)
actions_num = [['A','B'].index(action) for action in actions]
action_probs = tf.reduce_sum(tf.one_hot(actions_num, len(['A','B'])) * probs, axis=1)
mean_actor_loss = -tf.reduce_mean(advantages * tf.math.log(action_probs))
mean_critic_loss = tf.reduce_mean(tf.pow(advantages, 2))
actor_grads = actor_tape.gradient(mean_actor_loss, globals()[f'actor_{n}'].trainable_variables)
globals()[f'actor_opt_{n}'].apply_gradients(zip(actor_grads, globals()[f'actor_{n}'].trainable_variables))
critic_grads = critic_tape.gradient(mean_critic_loss, globals()[f'critic_{n}'].trainable_variables)
globals()[f'critic_opt_{n}'].apply_gradients(zip(critic_grads, globals()[f'critic_{n}'].trainable_variables))
Let's plot the average reward over epochs, to obtain something like this
As you can see the agent tends to learn earlier and more stably in the first case than in the second (where learning may not even happen), although the two training loops are in theory mathematically equivalent. How is that? The reason has probably something to do with the fact that, in the first option, GradientTape is watching the trainable variables several times per epoch before applying the gradient, whereas in the second option it does so only once. Even so, I can't figure out why exactly this produces the observed results. Can you help me understand?
So I am building a CNN that gets images using labels that go from 0 to 1.
What I mean is that I am trying to perform detection of one thing in the image and each image has a label between 0 and 1 that stands for the probability of said type of event being in that image.
I want to output this probability so I am using a sigmoid activation function in the output layer but I am having trouble in deciding what loss function makes sense in this situation. If my labels were 0 and 1s I would use Binary CrossEntropy but does that still make sense when my labels are floats ranging from 0 to 1?
Cheers.
This solution is for logits (output of last linear layer) not for output probabilities
def loss(logits, soft_labels):
anti_soft_labels = 1 - soft_labels
return soft_labels * tf.nn.softplus(-logits)
+ anti_soft_labels * tf.nn.softplus(logits) + tf.math.xlogy(soft_labels, soft_labels) + tf.math.xlogy(anti_soft_labels, anti_soft_labels)
loss(logits=tf.constant([10., 0, -10]), soft_labels=tf.constant([1., 0.5, 0.]))
# [4.53989e-05, 0.00000e+00, 4.53989e-05]
If you need to have 0 as minimal loss value for any soft label use
def loss(logits, soft_labels):
anti_soft_labels = 1 - soft_labels
return soft_labels * tf.nn.softplus(-logits) \
+ anti_soft_labels * tf.nn.softplus(logits) \
+ tf.math.xlogy(soft_labels, soft_labels) \
+ tf.math.xlogy(anti_soft_labels, anti_soft_labels)
loss(logits=tf.constant([10., 0, -10]), soft_labels=tf.constant([1., 0.5, 0.]))
# [4.53989e-05, 0.00000e+00, 4.53989e-05]```
I am training a classification problem using tensorflow estimators.
I want to calculate the f1 score for each batch to data along with precision and recall.
I calculate precision and recall using the code below and log them for evaluation and training.
I also calculate the fscore using the formula, but while logging the fscore I get an error.
pre = tf.metrics.precision(labels=labels,predictions=pred,name="precision")
rec = tf.metrics.recall(labels=labels,predictions=pred,name="recall")
fscore_val = tf.reduce_mean((2*pre[0]*rec[0]) / (pre[0] + rec[0] + 1e-5))
fscore_update = tf.group(pre[1], rec[1])
fscore = (fscore_val, fscore_update)
# logging metric at evaluation time
metrics['precision'] = pre
metrics['recall'] = rec
metrics['fscore'] = fscore
# logging metric at training time
tf.summary.scalar('precision', pre[1])
tf.summary.scalar('recall', rec[1])
tf.summary.scalar('fscore', fscore)
This is the error that I get.
TypeError: Expected float32, got <tf.Operation 'metrics_Left_Lane_Type/group_deps' type=NoOp> of type 'Operation' instead.
I understand why I am getting this error.
It is because fscore should be two values, similar to precision and recall.
Can someone please help me on how to do this in tensorflow estimators?
First of all, TensorFlow has it's own f1 score tf.contrib.metrics.f1_score and it is rather straightforward to use. The only possible downside is that it hides threshold value from user, choosing the best from specified quantity of possible thresholds.
predictions = tf.sigmoid(logits)
tf.contrib.metrics.f1_score(labels, predictions, num_thresholds=20)
If, for any reason, you want a custom implementation, you need to group update_ops. Every TensorFlow metric has operation that increments its value. You can set threshold manually when defining predictions
predictions = tf.greater(tf.sigmoid(logits), 0.5)
def f1_score(labels, predictions):
precision, update_op_precision = tf.metrics.precision(labels, predictions)
recall, update_op_recall = tf.metrics.recall(labels, predictions)
eps = 1e-5 #small constant for numerical stability
f1 = 2 * precision * recall / (precision + recall + eps)
f1_upd = 2 * update_op_precision * update_op_recall / (update_op_precision + update_op_recall + eps)
return f1, f1_upd
f1_score = f1_score(labels, predictions)
Then you can add it to eval_metric_ops dict or pass to summary.scalar
eval_metric_ops = {'f1': f1_score}
tf.summary.scalar('f1', f1_score[1])
It actually gives very close results with metric from contrib module
Here is the code:
def lstm(o, i, state):
#these are all calculated seperately, no overlap until....
#(input * input weights) + (output * weights for previous output) + bias
input_gate = tf.sigmoid(tf.matmul(i, w_ii) + tf.matmul(o,w_io) + b_i)
#(input * forget weights) + (output * weights for previous output) + bias
output_gate = tf.sigmoid(tf.matmul(i, w_oi) + tf.matmul(o,w_oo) + b_o)
#(input * forget weights) + (output * weights for previous output) + bias
forget_gate = tf.sigmoid(tf.matmul(i, w_fi) + tf.matmul(o,w_fo) + b_f)
memory_cell = tf.sigmoid(tf.matmul(i, w_ci) + tf.matmul(o,w_co) + b_c)
state = forget_gate * state + input_gate * memory_cell
output = output_gate * tf.tanh(state)
return output, state
And here is the drawing of the lstm:
I'm having trouble understanding how the two match up. Any help would be much appreciated.
This is an excellent blogpost on LSTMs. This code is directly implementing the LSTM; the code here is equivalent to the equations listed on Wikipedia:
The input & output weights reflect the state of the network. In a simple fully-connected (FC) layer, we'd only have one weight matrix, which is what we would use to calculate the output of the layer:
The advantage of a LSTM, however, is that it includes multiple sources of information, or state; this is what we refer to when we say that a LSTM has memory. We have the output gate, just like the FC layer, but we also have the forget gate, the input gate, the cell state, and the hidden state. These all combine to provide multiple, different, sources of information. The equations show how they come together to produce the output.
In the equations, x is the input, and f_t is the input gate. I would recommend reading the linked blogpost and the Wikipedia article to get an understanding of how the equations implement a LSTM.
The image depicts the input gate providing output to the cell based on the values from previous cells, and previous values of the input gate. The cell also incorporates the forget gate; the inputs are then fed into the output gate, which also takes previous values of the output gate as inputs.
How can I force certain dimensionality of the output of the conv2d_transpose layer ? My problem is that I use it for upsampling and I want to match the dimensionality of my labels and the output of the NN, for example if I have a feature map as Bx25x40xC how can I make it Bx100x160xC (i.e. upsample exactly 4x times)?
It seems like dimensions of the output can be calculated using
h = ((h_in - 1) * stride_h) + kernel_h - 2 * pad_h
w = ((w_in - 1) * stride_w) + kernel_w - 2 * pad_w
one can manipulate strides and kernels, but padding is controlled by 'same'/'valid' algorithms which, to my understanding, means they are pretty much uncontrollable, so is the resulting output size. For comparison, in caffe, one can at least force the padding in attempt to match the desired output explicitly.