i have the following task: i'm supposed to find the coordinates of an targetpoint. The features that are given, are the distances from anchors to that targetpoint. See img 1 distances from anchors to target
I planned to create a simple neural network first just with input and output layer. The cost-function i try to minimize is: correct_coordinate - mean of square(summed_up_distances*weights).
But now i'm kind of stuck in how to model the neural network, so that i'm outputting coordinates [x,y], as the current model would just output a single value. See img 2 current model
Right now I would than just train 2 neural networks. One that outputs the x-value, and one that outputs the y-value.
I'm just not sure if that is the best practice with tensorflow.
So I would like to know, how would you model the NN with tensorflow?
You can build the network with 2 nodes in the output layer. there is no need to train 2 neural networks for the same task.
Related
since I am not very experienced I am struggling with a siamese twin network.
I have 2 images which run trough the same CNN and generate each a distinct feature vector. I would like to train a further network interpreting these two image vectors (each with 32 elements). In an intermediate step I would like to use these vectors as input for a function NCC which is located as a Layer between the CNN and the NN and defined in the following snippet ( i.e. the output should be used for the next NN):
def NCC(a, b):
l=a.shape[1]
av_a=tf.math.reduce_mean(a)
av_b=tf.math.reduce_mean(b)
a=a-av_a
b=b-av_b
norm_a=tf.math.sqrt(tf.math.reduce_sum(a*a))
norm_b=tf.math.sqrt(tf.math.reduce_sum(b*b))
a=a/norm_a
b=b/norm_b
A=tf.reshape(tf.repeat(a, axis=0, repeats=l),(l,l))
B=tf.reshape(tf.repeat(b, axis=0, repeats=l),(l,l))
ncc=Flatten()(A*tf.transpose(B))
return ncc
The output vector (for batchsize=1) should have a 32x32=1024 elements. It seems to work for a batchsize of 1. If I increase the batch size I run into trouble because the input vectors are now tensors with shape=(batch_size,32). I think this is a very stupid question- But how can I circumvent this issue? (It should be noted I wish also to have an output tensor with shape=(batch_size,1024))
Thanks in advance
Mike
Suppose I have vectors of dimension 1 x N {X_1...X_n} and {X_1' ...X_n'} where each X and X' are related but the relation is not able to be modeled by a function. I want to train a neural network by feeding it X_i and outputting Y_i with dimension N x 1, such that norm((X_i')(Y_i)) is maximized. The constraint is that Y_i has a norm of 1 (otherwise I will just use as large numbers as possible in Y_i).
I do not use X_i' as the inputs because they are not available in real life. I hope that when I test the neural network by feeding it {X_n+1 ... X_k}, it will output {Y_n+1 ... Y_k} where norm((X_n+1')(Y_n+1)) are maximized. Again, note that I only have {X_n+1'...X_k'} when testing, but not in real life where the neural network will be used.
I tried defining custom tensorflow or keras loss functions, but they don't seem to work. Also I tried using a neural network to first predict X_i' from X_i, but the performance is not very good.
A difficulty in this is to define a loss function that has no labels, and make neural network do backprop using this loss function. Any ideas how this may be achieved?
There are multiple examples how to build Tensorflow model to recognise cats and dogs from images. Now suppose I have audio associated with each picture and train separate network to recognise cats and dogs by sound.
I want to feed predictions of both networks into another layer to combine results and increase final prediction success rate.
How should my model look like?
Create two neural networks, that given a pair image-audio, you input each value to its corresponding net.
After the convolution steps or whatever you want to use, proceed as you would do with a normal CNN, in the last step before passing data to a FNN, when you flatten the data, do the same with the output of the audio NN.
So, as an example, if the output of the images one (flattened) has shape 2048 and the audio 4096 just append these two and make first layer of the FNN to have the sum of these shapes = 6144.
I have a question regarding convolutional neural network (CNN) training.
I have managed to train a network using tensorflow that takes an input image (1600 pixels) and output one of three classes that matches it.
Testing the network with variations of the trained classes is giving good results. However; when I give it a different -fourth- image (does not contain any of the trained 3 image), it always returns a random match to one of the classes.
My question is, how can I train a network to classify that the image does not belong to either of the three trained images? A similar example, if i trained a network against the mnist database and then a gave it the character "A" or "B". Is there a way to discriminate that the input does not belong to either of the classes?
Thank you
Your model will always make predictions like your labels, so for example if you train your model with MNIST data, when you will make predictions, prediction will always be 0-9 just like MNIST labels.
What you can do is train a different model first with 2 classes in which you will predict if an image belongs to data set A or B. E.x. for MNIST data you label all data as 1 and add data from other sources that are different (not 0-9) and label them as 0. Then train a model to find if image belongs to MNIST or not.
Convolutional Neural Network (CNN) predicts the result from the defined classes after training. CNN always return from one of the classes regardless of accuracy. I have faced similar problem, what you can do is to check for accuracy value. If the accuracy is below some threshold value then it's belong to none category. Hope this helps.
You probably have three output nodes, and choose the maximum value (one-hot encoding). That's a bit unfortunate as it's a low number of outputs. Non-recognized inputs tend to cause pretty random outputs.
Now, with 3 outputs, roughly speaking you can get 7 outcomes. You might get a single high value (3 possibilities) but non-recognized input can also cause 2 high outputs (also 3 possibilities) or approximately equal output (also 3 possibilities). So there's a decent chance (~ 3/7) of random inputs producing a pattern on the output nodes which you'd only expect for a recognized input.
Now, if you had 15 classes and thus 15 output nodes, you'd be looking at roughly 32767 possible outcomes for unrecognized inputs, only 15 of which correspond to expected one-hot outcomes.
Underlying this is a lack of training data. If your training set has examples outside the 3 classes, you can just dump this in a 4th "other" category and train with that. This by itself isn't a reliable indication, as usually the theoretical "other" set is huge, but you now have 2 complementary ways of detecting other inputs: either by the "other" output node or by one of the 11 ambiguous outputs.
Another solution would be to check what outcome your CNN usually gives when given something else. I believe the last layer must be softmax and your CNN should return probabilities of the three given classes. If none of these probabilities is close to 1 this might be a sign that this is something else assuming your CNN is well trained (it must be fined for overconfidence when predicting wrong labels).
I want to train a convolutional neural network with TensorFlow to do multi-output multi-class classification.
For example: If we take the MNIST sample set and always combine two random images two a single one and then want to classify the resulting image. The result of the classification should be the two digits shown in the image.
So the output of the network could have the shape [-1, 2, 10] where the first dimension is the batch, the second represents the output (is it the first or the second digit) and the third is the "usual" classification of the shown digit.
I tried googling for this for a while now, but wasn't able find something useful. Also, I don't know if multi-output multi-class classification is the correct naming for this task. If not, what is the correct naming? Do you have any links/tutorials/documentations/papers explaining what I'd need to do to build the loss function/training operations?
What I tried was to split up the output of the network into the single outputs with tf.split and then use softmax_cross_entropy_with_logits on every single output. The result I averaged over all outputs but it doesn't seem to work. Is this even a reasonable way?
For nomenclature of classification problems, you can have a look at this link:
http://scikit-learn.org/stable/modules/multiclass.html
So your problem is called "Multilabel Classification". In normal TensorFlow multiclass classification (classic MNIST) you will have 10 output units and you will use softmax at the end for computing losses i.e. "tf.nn.softmax_cross_entropy_with_logits".
Ex: If your image has "2", then groundtruth will be [0,0,1,0,0,0,0,0,0,0]
But here, your network output will have 20 units and you will use sigmoid i.e. "tf.nn.sigmoid_cross_entropy_with_logits"
Ex: If your image has "2" & "4", then groundtruth will be [0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0], i.e. first ten bits to represent first digit class and second to represent second digit class.
First you have to provide two labels to an image comprised of two different images. Then change your objective loss function so it maximizes the outputs of the two given labels and train your model. I don't think you need to split the outputs.