since I am not very experienced I am struggling with a siamese twin network.
I have 2 images which run trough the same CNN and generate each a distinct feature vector. I would like to train a further network interpreting these two image vectors (each with 32 elements). In an intermediate step I would like to use these vectors as input for a function NCC which is located as a Layer between the CNN and the NN and defined in the following snippet ( i.e. the output should be used for the next NN):
def NCC(a, b):
l=a.shape[1]
av_a=tf.math.reduce_mean(a)
av_b=tf.math.reduce_mean(b)
a=a-av_a
b=b-av_b
norm_a=tf.math.sqrt(tf.math.reduce_sum(a*a))
norm_b=tf.math.sqrt(tf.math.reduce_sum(b*b))
a=a/norm_a
b=b/norm_b
A=tf.reshape(tf.repeat(a, axis=0, repeats=l),(l,l))
B=tf.reshape(tf.repeat(b, axis=0, repeats=l),(l,l))
ncc=Flatten()(A*tf.transpose(B))
return ncc
The output vector (for batchsize=1) should have a 32x32=1024 elements. It seems to work for a batchsize of 1. If I increase the batch size I run into trouble because the input vectors are now tensors with shape=(batch_size,32). I think this is a very stupid question- But how can I circumvent this issue? (It should be noted I wish also to have an output tensor with shape=(batch_size,1024))
Thanks in advance
Mike
Related
I have a data-set without labels, but I do have a way to get pairs of examples with opposite labels, that is given a pair x,z I know that their true labels are either 0,1 or 1,0.
So, I am building a model that accepts pairs of samples as input, and learns to classify them with opposite labels. Assuming I have an arbitrary model for predicting a single sample, y_hat = f(x), I am building a model with Keras that accepts pairs of samples (x,z) and outputs pairs of predictions, f(x), f(z). I then use a custom loss function that drives the model towards the correct direction: Given that a regular binary classifier is trained using the Binary Cross Entropy (BCE) to make the predicted and desired output "close", I use the negative BCE. Also, since BCE is not symmetric, I symmetrize it. So, the loss function I give the model.compile method is:
from tensorflow import keras
bce = keras.losses.BinaryCrossentropy()
def neg_sym_bce(y1, y2):
return (- 0.5 * (bce(y1, y2) + bce(y2, y1)))
My problem is, this model fails to learn to classify even a single pair of my data (I get f(x)~=f(z)~=0.5), and if I try to train it with synthetic "easy" data, it takes hundreds of epochs to converge (also on a single pair).
This made me suspect that it has to do with a "vanishing gradient" problem. Indeed, when I plot (see below) the loss for a single pair, which is a function of 2 variables (the 2 outputs), it is evident that there is a wide plateau around the 0.5, 0.5 point. It is also evident that the global minima is, as expected, around the points 0,1 and 1,0.
So, is there a way to deal with the vanishing gradient here? I read about the problem but the references I found deal with vanishing gradient in the network, not in the loss itself.
Or, is there another loss that can drive the model to predict opposite labels?
Think if your labels are always either 0,1 or 1,1 just use categorical_crossentropy for the loss.
My situation is I have a excel file with 747 nodes (as input) each with a value (imagine 747 columns with floats) and I have an output of 741 values/columns with again floats. These are basically inputs and outputs of a geological simulation. So one row has 747(input)+741(output) = 1488 floats which is one dataset (from one simulation). I have 4 such datasets (rows) to train a neural network such that when I test them on 3 test datasets (747 columns) I get the output of 741 columns. This is just a simple run to get the skeleton of the neural network going before further modifications.
I have come across the Multi-Target Regression example of NYCTaxi (https://github.com/zeahmed/DeepLearningWithMLdotNet/tree/master/NYCTaxiMultiOutputRegression) but I can seem to wrap my head around it.
This is the training set (Input till and including column 'ABS', rest is output):
https://docs.google.com/spreadsheets/d/12TKVbGExt9KcK5RQKTexrToVo8qA5YfeItSaa7E2QdU/edit?usp=sharing
This is the test set:
https://docs.google.com/spreadsheets/d/1-RjyZsdguucCSOr9QTdTp2ehJBqWCr5yz1-aRjQ_4zo/edit?usp=sharing
This is the test Output (To validate) : https://docs.google.com/spreadsheets/d/10O_6711CEpJ4DN1w-kCmW01NikjFVZTDmNRuqO3U_6A/edit?usp=sharing
Any guidance/tips would be well appreciated. TIA!
We can use an AutoEncoder for this task. An AutoEncoder takes in the data, compresses it into a latent representation. Now, this representation vector is used to construct the output variable.
So, you can feed the 747-dimensional vector to the model and generate another 747-dimensional vector which is the output. After proper training, the model will be able to generate the target variables for a given set of inputs.
Saying I have a 2000x100 matrix, I put it into 10 dimension embedding layer, which gives me 2000x100x10 tensor. so it's 2000 examples and each example has a 100x10 matrix. and then, I pass it to a conv1d and KMaxpolling to get 2000x24 matrix, which is 2000 examples and each example has a 24 dimension vector. and now, I would like to recombine those examples before I apply another layer. I would like to combine the first 10 examples together, and such and such, so I get a tuple. and then I pass that tuple to the next layer.
My question is, Can I do that with Keras? and any idea on how to do it?
The idea of using "samples" is that these samples should be unique and not relate to each other.
This is something Keras will demand from your model: if it started with 2000 samples, it must end with 2000 samples. Ideally, these samples do not talk to each other, but you can use custom layers to hack this, but only in the middle. You will need to end with 2000 samples anyway.
I believe you're going to end your model with 200 groups, so maybe you should already start with shape (200,10,100) and use TimeDistributed wrappers:
inputs = Input((10,100)) #shape (200,10,100)
out = TimeDistributed(Embedding(....))(inputs) #shape (200,10,100,10)
out = TimeDistributed(Conv1D(...))(out) #shape (200,10,len,filters)
#here, you use your layer that will work on the groups without TimeDistributed.
To reshape a tensor without changing the batch size, use the Reshape(newShape) layer, where newShape does not include the first dimension (batch size).
To reshape a tensor including the batch size, use a Lambda(lambda x: K.reshape(x,newShape)) layer, where newShape includes the first dimension (batch size) - Here you must remember the warning above: somewhere you will need to undo this change so you end up with the same batch size as the input.
I have downloaded many face images from web. In order to learn Tensorflow I want to feed those images to a simple fully-connected neural network with a single hidden layer. I have found an example code in here.
Since I am a beginner, I don't know how to train, evaluate, and test the network with the downloaded images. The code owner used a '.mat' file and a .pkl file. I don't understand how he organized training and test set.
In order to run the code with my images;
Do I need to divide my images into training, test, and validation folders and turn each folder into a mat file? How am I going to provide labels for the training?
Besides, I don't understand why he used a '.pkl' file?
All in all, I would like to change this code so that I can find test, training , and validation set classification performance with my image dataset.
It might be an easy question, but it is important for me as it is a starting step. Thanks for your understanding.
First, you don't have to use .mat files nor pickles. Tensorflow expects numpy array.
For instance, let's say you have 70000 images of size 28x28 (=784 dimensions) belonging to 10 classes. Let's also assume that you'd like to train a simple feedforward neural network to classify the images.
The first step would be to split the images between train and test (and validation, but let's put this aside for the sake of simplicity). For the sake of the example, let's imagine that you chose randomly 60000 images for your training set and 10000 for your test set.
The second step would be to ensure that your data has the right format. Here, you'd like your training set to consist in one numpy array of shape (60000, 784) for the images and another one of shape (60000, 10) for the labels (if you use one-hot encoding to represent your classes). As for your test set, you should have an array of shape (10000, 784) for the images and one of shape (10000, 10) for the labels.
Once you have these big numpy arrays, you should define placeholders that will allow you to feed data to you network during training and evaluation.
images = tf.placeholder(tf.float32, shape=[None, 784])
labels = tf.placeholder(tf.int64, shape=[None, 10])
The None here means that you can feed a batch of any size, i.e. as many images as you want, as long as you numpy array is of shape (anything, 784).
The third step consists in defining your model as well as the loss function and the optimizer.
The fourth step consists in training your network by feeding it with random batches of data using the placeholders created above. As your network is training, you can periodically print its performance like the training loss/accuracy as well as the test loss/accuracy.
You can find a complete and very simple example here.
I want to create a network that has specific fixed connections between layers.
For example,
Sparsely connected neural network
I tried looking into functions in Tensorflow, but I only found dense networks with regularizers, which doesn't function as I want.
If it's not possible in tensorflow, then please suggest some other library that can be used. Thanks!
You can always find a workaround. Let's say a layer does y = xW (Wx is also correct) but you want some of the entries in W always be zeros. You can do it column-wise:
For column i (or element i since y is a vector) of the output, y_i = x * D_i * W_i. The matrix D_i is a constant diagonal matrix (tf.constant, tf.diag) that controls what element would be zeros.
Then you can use tf.concat to combine all y_i to matrix Y.
You can abstract this into a function whose signature may look like def sparse_layer(input_layer, gates_matrix, activation_f, ...) which returns the output layer.