So, I am trying to figure out how to use the new input pipeline framework in TF. The toy model I am using for it tries to memorize an image by training on pixel coordinates as inputs and RGB values as labels. The code I have at the moment goes something like this
W=442
H=500
image = tf.read_file('kitteh.png')
image = tf.image.decode_png(image, channels=3)
# normalize to 0-1 range
image = (image - tf.reduce_min(image)) / (tf.reduce_max(image) - tf.reduce_min(image))
# features and labels
coordinates = tf.constant([(x, y) for x in range(W) for y in range(H)], dtype=tf.float32)
rgb = tf.reshape(image, [-1, 3])
# dataset and input pipeline
features = tf.data.Dataset.from_tensors(coordinates)
labels = tf.data.Dataset.from_tensors(rgb)
data = tf.data.Dataset.zip((features, labels))
batched = data.batch(100)
iterator = batched.make_one_shot_iterator()
inputs, labels = iterator.get_next()
def net(inputs, reuse=False):
l1 = tf.layers.dense(inputs, 20, activation=tf.nn.relu, name='l1', reuse=reuse)
l2 = tf.layers.dense(l1, 20, activation=tf.nn.relu, name='l2', reuse=reuse)
l3 = tf.layers.dense(l2, 20, activation=tf.nn.relu, name='l3', reuse=reuse)
l4 = tf.layers.dense(l3, 20, activation=tf.nn.relu, name='l4', reuse=reuse)
l5 = tf.layers.dense(l4, 20, activation=tf.nn.relu, name='l5', reuse=reuse)
l6 = tf.layers.dense(l5, 20, activation=tf.nn.relu, name='l6', reuse=reuse)
l7 = tf.layers.dense(l6, 20, activation=tf.nn.relu, name='l7', reuse=reuse)
return tf.layers.dense(l7, 3, activation=tf.nn.sigmoid, name='out', reuse=reuse)
model = net(inputs)
loss = tf.losses.mean_squared_error(labels, model)
step = tf.train.get_global_step()
train = tf.train.AdamOptimizer().minimize(loss, global_step=step)
test = net(coordinates, reuse=True)
with tf.Session() as session:
session.run((tf.global_variables_initializer(), tf.local_variables_initializer()))
orig = session.run(image)
for i in range(50000):
f, l = session.run([inputs, labels])
print(f.shape, l.shape)
And here are the questions:
This code doesn't work. For whatever reason, the batch() function doesn't work right. When I try to print my label and input shapes, I expect to get (100, 2) and (100, 3), but I get (1, 221000, 2), (1, 221000, 3) and an OutOfRangeError. I seem to be following the "importing data" tutorial, but I do not get the expected result.
How do I get a full set of data from a dataset? I want to have it generate a complete picture on every Nth step, can I get all the coordinates from the dataset?
I have width and height of the image hard-coded, but it would be nice to get them from the decoded data. I tried to do W = image.get_shape()[0] but it resulted in my parser() function failing because W is not defined yet. Is there a solution?
Edit #1: updated the code to my latest attempt and updated the questions to reflect the latest problem I am getting.
Edit #3: it seems I made a mistake in my previous edit. The problem seems to be with batch() rather than zip(). When I print output shapes for data and batched datasets, I get the following
(TensorShape([Dimension(221000), Dimension(2)]),
TensorShape([Dimension(221000), Dimension(3)]))
(TensorShape([Dimension(None), Dimension(221000), Dimension(2)]),
TensorShape([Dimension(None), Dimension(221000), Dimension(3)]))
Solved the first and primary issue. It seems that there is a subtle difference between Dataset.from_tensor() and 'Dataset.from_tensor_slices()`. Both take tensors as arguments, but the prior method treats the whole tensor as a single training sample, where the later treats the first axis as samples and the rest as data, thus splitting the tensor into samples along the first axis. Using the later function fixed the problem I had.
As for questions 2 and 3, I would still love to hear any answers, but currently there seems to be no way to do it the way I want, so I have ended up using hard-coded image dimensions and using coordinates constant tensor to paint a complete image.
Related
I am working on a triplet loss based model for this Kaggle competition.
Short Description- In this competition, we have been challenged to build an algorithm to identify individual whales in images by analyzing a database of containing more than 25,000 images, gathered from research institutions and public contributors.
https://www.kaggle.com/c/humpback-whale-identification?rvi=1
I have decided to use a Siamese network architecture and train it to give me encodings which I can then use to calculate the distance between two pictures of whales. If this distance is below a particular threshold the two pictures belong to the same whale and if this distance is greater then, they aren't the same whale.
This is the Triplet loss function(learnt it from Andrew's deeplearning specialization) I used but i also normalized the encoding's to make the loss function more interpretable(easier to determine margin and split point) across different models(if that makes sense).(First, tried it without the normalization and when it didnt work i tried normalizing.) I also have tried changing alpha(margin) and varied it from 0.2 to 0.6.
from tensorflow.nn import l2_normalize as norm_l2
def triplet_loss(y_true, y_pred, alpha = 0.3):
"""
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 128)
positive -- the encodings for the positive images, of shape (None, 128)
negative -- the encodings for the negative images, of shape (None, 128)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
anchor, positive, negative = norm_l2(anchor), norm_l2(positive), norm_l2(negative)
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,positive)), axis = -1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)), axis = -1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
return loss
This is an example of one of the model architectures i tried out. I have tried using pretrained Facenet, ResNet, DenseNet and Xception till now. I have tried Freezing different numbers of layers in each.
R = tf.keras.applications.ResNet50(include_top=False, weights = 'imagenet', input_shape=(224,224,3))
lr = 0.0001
optimizer = Adam(learning_rate=lr)
R.compile(optimizer=optimizer, loss = triplet_loss)
for layer in R.layers[0:30]:
layer.trainable = False
em_Rmodel = Sequential([
R,
GlobalAveragePooling2D(),
#tf.keras.layers.GlobalMaxPooling2D(),
Dense(512, activation='relu'),
bn(),
Dense(256, activation = 'sigmoid'),
Dense(128, activation = 'sigmoid')
])
def make_tripletModel(model):
#I was manually changing the input shape to fit the default shape of pretrained networks
A = Input(shape = (224, 224, 3), name='anchor')
P = Input(shape = (224, 224, 3), name = 'anchorPositive')
N = Input(shape = (224, 224, 3), name = 'anchorNegative')
enc_A = model(A)
enc_P = model(P)
enc_N = model(N)
tripletModel = Model(inputs=[A, P, N], outputs=[enc_A, enc_P, enc_N])
return tripletModel
tripletModel = make_tripletModel(em_Rmodel)
I have been training using semi-hard triplets and have also been augmenting data properly to generate more training images.
This is the batch generator that i used for training. crop_batch is a function that crops images to show only the whale's tail, using which one can identify whales. It uses a DenseNet trained on more than 1000 images with whale tails and the bounding box surrounding it. Does the work sufficiently well.
def batch_generator_RN(batch_size = batch_size, ishape = (256, 256, 3), model_input_shape = (224, 224, 3)):
triplet_generator = get_triplets()
y_val = np.zeros((batch_size, 2, 1))
anchors = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
positives = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
negatives = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
while True:
for i in range(batch_size):
anchors[i], positives[i], negatives[i] = next(triplet_generator)
anc = crop_batch(anchors, batch_size= batch_size, img_shape=model_input_shape)
pos = crop_batch(positives, batch_size= batch_size, img_shape=model_input_shape)
neg = crop_batch(negatives, batch_size= batch_size, img_shape=model_input_shape)
x_data = {'anchor': anc,
'anchorPositive': pos,
'anchorNegative': neg
}
yield (x_data, [y_val, y_val, y_val])
And finally, this, in general, is how i have been trying to train these models. I have tried reducing and increasing learning rate, batch_size = 16.
lr = 0.0001
optimizer = Adam(learning_rate=lr)
tripletModel.compile(optimizer = optimizer, loss = triplet_loss)
es = EarlyStopping(monitor='loss', patience=20, min_delta=0.05, restore_best_weights=True)
#mc = ModelCheckpoint('Rmodel.h5', monitor='loss', save_best_only=True, save_weights_only=True)
rlr = ReduceLROnPlateau(monitor='loss',min_delta=0.05,factor = 0.1,patience = 5, verbose = 1, min_lr = 0)
gen = batch_generator(batch_size)
tripletModel.fit(gen, steps_per_epoch=64, epochs = 40, callbacks=[es, rlr])
So after training all these models, in some models the triplet loss does go down for a while but then plateaus and basically learns nothing meaningful(which basically means that just by looking at the distance between two embeddings i cant figure out if they are the same whale or not.). In other models, immediately after the first or the second epoch the weights converge and don't change at all and doesn't learning anything.
I have tried a very wide range of learning rates and i am pretty sure that it isnt the problem.
Please tell me if i should add all the code files for you to understand the problem better. The reason i havent done it yet because i havent cleaned it but will gladly do so if required. Thanks.
When you say that it doesn't learn anything, is it that the loss reaches a plateau and thus it stops decreasing or it does decrease significantly but when you predict the embeddings of both same and different whales are are similar in value?
The triples_loss() fn and batch_generator_RN() fn are correct, the problem is not related to the data generation.
However, I suspect that your learning rate is too high while you freeze a lot of layers, i.e. numerous trainable parameters are frozen, which may lead to your network being unable to converge.
My suggestion is to unfreeze all the layers and decrease the learning rate to 0.00001 and start training again, regardless of the architecture that you use (Xception/ResNet etc.)
I am trying to make an adversarial image for the inceptionV3 model with tensorflow. For that I use a specific loss on the pixel of my input image. This works well
model_input_layer = model.layers[0].input
model_output_layer = model.layers[-1].output
cost_function = model_output_layer[0, object_type_to_fake]
gradient_function = K.gradients(cost_function, model_input_layer)[0]
grab_cost_and_gradients_from_model = K.function([model_input_layer, K.learning_phase()], [cost_function, gradient_function])
Now I would like to make only certain pixels trainable to create a patch on a certain square and not on the all input image.
I have tried to use variable = tf.slice(model_input_layer, [0, 100, 100, 0], [-1, 100, 100, -1]) but it does not work.
Does anyone has already done this ?
I am trying to use the scikit-learn in Keras to fine tune the model that has one input(images) and 2 outputs(rotational vector and translation vector). The code snippet is as below,
img_input =Input(shape=(img_rows, img_cols, img_channels))
model = KerasRegressor(build_fn = toy_model, verbose = 1)
loss_weights = [[1.0, 250.0], [1.0, 500.0], [1.0, 750.0]]
epochs =[10, 20]
batches = [5, 10]
param_grid = dict(loss_weight= loss_weights, epochs = epochs,
batch_size = batches)
grid = GridSearchCV(estimator = model, param_grid=param_grid)
grid_result = grid.fit(train_imgs, [train_pose_tx, train_pose_rt])
I want to fine tune the "loss_weights" parameter for this model. However, I get the following error
ValueError: Found input variables with inconsistent numbers of samples:[895, 2]
As I understand since this model has single input, this functionality must be supported.
Link to Github gist :
https://gist.github.com/sushant4788/1f84cd2781f96fb752ee1f16a56d1bcb
I have an input pipeline similar to the one in the Convolutional Neural Network tutorial. My dataset is imbalanced and I want to use minority oversampling to try to deal with this. Ideally, I want to do this "online", i.e. I don't want to duplicate data samples on disk.
Essentially, what I want to do is duplicate individual examples (with some probability) based on the label. I have been reading a bit on Control Flow in Tensorflow. And it seems tf.cond(pred, fn1, fn2) is the way to go. I am just struggling to find the right parameterisation, since fn1 and fn2 would need to output lists of tensors, where the lists have the same size.
This is roughly what I have so far:
image = image_preprocessing(image_buffer, bbox, False, thread_id)
pred = tf.reshape(tf.equal(label, tf.convert_to_tensor([2])), [])
r_image = tf.cond(pred, lambda: [tf.identity(image), tf.identity(image)], lambda: [tf.identity(image),])
r_label = tf.cond(pred, lambda: [tf.identity(label), tf.identity(label)], lambda: [tf.identity(label),])
However, this raises an error as I mentioned before:
ValueError: fn1 and fn2 must return the same number of results.
Any ideas?
P.S.: this is my first Stack Overflow question. Any feedback on my question is appreciated.
After doing a bit more research, I found a solution for what I wanted to do. What I forgot to mention is that the code mentioned in my question is followed by a batch method, such as batch() or batch_join().
These functions take an argument that allows you to group tensors of various batch size rather than just tensors of a single example. The argument is enqueue_many and should be set to True.
The following piece of code does the trick for me:
for thread_id in range(num_preprocess_threads):
# Parse a serialized Example proto to extract the image and metadata.
image_buffer, label_index = parse_example_proto(
example_serialized)
image = image_preprocessing(image_buffer, bbox, False, thread_id)
# Convert 3D tensor of shape [height, width, channels] to
# a 4D tensor of shape [batch_size, height, width, channels]
image = tf.expand_dims(image, 0)
# Define the boolean predicate to be true when the class label is 1
pred = tf.equal(label_index, tf.convert_to_tensor([1]))
pred = tf.reshape(pred, [])
oversample_factor = 2
r_image = tf.cond(pred, lambda: tf.concat(0, [image]*oversample_factor), lambda: image)
r_label = tf.cond(pred, lambda: tf.concat(0, [label_index]*oversample_factor), lambda: label_index)
images_and_labels.append([r_image, r_label])
images, label_batch = tf.train.shuffle_batch_join(
images_and_labels,
batch_size=batch_size,
capacity=2 * num_preprocess_threads * batch_size,
min_after_dequeue=1 * num_preprocess_threads * batch_size,
enqueue_many=True)
Trying to implement a minimal toy RNN example in tensorflow.
The goal is to learn a mapping from the input data to the target data, similar to this wonderful concise example in theanets.
Update: We're getting there. The only part remaining is to make it converge (and less convoluted). Could someone help to turn the following into running code or provide a simple example?
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
init_scale = 0.1
num_steps = 7
num_units = 7
input_data = [1, 2, 3, 4, 5, 6, 7]
target = [2, 3, 4, 5, 6, 7, 7]
#target = [1,1,1,1,1,1,1] #converges, but not what we want
batch_size = 1
with tf.Graph().as_default(), tf.Session() as session:
# Placeholder for the inputs and target of the net
# inputs = tf.placeholder(tf.int32, [batch_size, num_steps])
input1 = tf.placeholder(tf.float32, [batch_size, 1])
inputs = [input1 for _ in range(num_steps)]
outputs = tf.placeholder(tf.float32, [batch_size, num_steps])
gru = rnn_cell.GRUCell(num_units)
initial_state = state = tf.zeros([batch_size, num_units])
loss = tf.constant(0.0)
# setup model: unroll
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
step_ = inputs[time_step]
output, state = gru(step_, state)
loss += tf.reduce_sum(abs(output - target)) # all norms work equally well? NO!
final_state = state
optimizer = tf.train.AdamOptimizer(0.1) # CONVERGEs sooo much better
train = optimizer.minimize(loss) # let the optimizer train
numpy_state = initial_state.eval()
session.run(tf.initialize_all_variables())
for epoch in range(10): # now
for i in range(7): # feed fake 2D matrix of 1 byte at a time ;)
feed_dict = {initial_state: numpy_state, input1: [[input_data[i]]]} # no
numpy_state, current_loss,_ = session.run([final_state, loss,train], feed_dict=feed_dict)
print(current_loss) # hopefully going down, always stuck at 189, why!?
I think there are a few problems with your code, but the idea is right.
The main issue is that you're using a single tensor for inputs and outputs, as in:
inputs = tf.placeholder(tf.int32, [batch_size, num_steps]).
In TensorFlow the RNN functions take a list of tensors (because num_steps can vary in some models). So you should construct inputs like this:
inputs = [tf.placeholder(tf.int32, [batch_size, 1]) for _ in xrange(num_steps)]
Then you need to take care of the fact that your inputs are int32s, but a RNN cell works on float vectors - that's what embedding_lookup is for.
And finally you'll need to adapt your feed to put in the input list.
I think the ptb tutorial is a reasonable place to look, but if you want an even more minimal example of an out-of-the-box RNN you can take a look at some of the rnn unit tests, e.g., here.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/kernel_tests/rnn_test.py#L164