I want to implement this. Here X1, X2, and X3 are three datasets with the same size (2303*12), Every dataet goes to ANN with the previous dataset cell state For example LSTM(ANN(x2+c(t-1))). The output will be many to many. We will get the output for every dataset. We want to make recurrent connections between these datasets. How to implement this in keras ?
Related
I have five classes and I want to compare four of them against one and the same class. This isn't a One vs Rest classifier, as for each output I want to score them against one base class.
The four outputs should be: base class vs classA, base class vs classB, etc.
I could do this by having multiple binary classification tasks, but that's wasting computation time if the first layers are BERT preprocessing + pretrained BERT layers, and the only differences between the four classifiers are the last few layers of BERT (finetuned ones) and the Dense layer.
So why not merge the graphs for more performance?
My inputs are four different datasets, each annotated with true/false for each class.
As I understand it, I can re-use most of the pipeline (BERT preprocessing and the first layers of BERT), as those have shared weights. I should then be able to train the last few layers of BERT and the Dense layer on top differently depending on the branch of the classifier (maybe using something like keras.switch?).
I have tried many alternative options including multi-class and multi-label classifiers, with actual and generated (eg, machine-annotated) labels in the case of multiple input labels, different activation and loss functions, but none of the results were acceptable to me (none were as good as the four separate models).
Is there a solution for merging the four different models for more performance, or am I stuck with using 4x binary classifiers?
When you train DNN for specific task it will be (in vast majority of cases) be better than the more general model that can handle several task simultaneously. Saying that, based on my experience the properly trained general model produces very similar results to the original binary ones. Anyways, here couple of suggestions for training strategies (assuming your training datasets for each task are completely different):
Weak supervision approach
Train your binary classifiers, and label your datasets using them (i.e. label with binary classifier trained on dataset 2 datasets [1,3,4]). Then train your joint model as multilabel task using all the newly labeled datasets (don't forget to randomize samples before feeding them to trainer ;) ). Here you will need to experiment if you will use threshold and set a label to 0/1 or use the scores of the binary classifiers.
Create custom loss function that will not penalize if no information provided for certain class. So when your will introduce sample from (say) dataset 2, your loss will be calculated only for the 2nd class.
Of course you can apply both simultaneously. For example, if you know that binary classifier produces scores that are polarized (most results are near 0 or 1), you can use weak labels, and automatically label your data with scores. Now during the second stage penalize loss such that for score x' = 4(x-0.5)^2 (note that you get logits from the model, so you will need to apply sigmoid function). This way you will increase contribution of the samples binary classifier is confident about, and reduce that of less certain ones.
As for releasing last layers of BERT, usually unfreezing upper 3-6 layers is enough. Releasing more layers improves results very little and increases time and memory requirements.
I am trying to implement the technique described in the MSG-GAN paper:
https://arxiv.org/pdf/1903.06048.pdf
But I am having difficulty understanding some things, for example, how are the connections made from the generator to the discriminator? These are Conv2D connections literally? (in that case, how would I insert the real images to train the discriminator?) Or does the discriminator have multiple outputs (one prediction for each resolution and the generator has to optimize the average loss of the resolutions)?
How are the connections made from the generator to the discriminator?
Generator output them and discriminator concatenate them with feature maps from last layer at corresponding input layer.
These are Conv2D connections literally?
Those just input tensor with shape like(batch size, W, H, 3), same as ordinary image input.
Does the discriminator have multiple outputs?
No, this is end to end training, training with all resolution outputs at same time, otherwise it will just like the Progressive Growing GAN and no reason for concatenate operation at each input layer(beginning of each block) of discriminator.
this is only a partial answer.
I would say that if you had to implement this in keras, and you don't want each model (G and D) to be in one piece, it's actually easier to have them separated and then use tf.GradientTape() to train
Does the discriminator have multiple outputs?
yes, if implementing them as separate models, yes there will be multiple inputs and multiple outputs of multiple resolutions, only one of those is the final output.
I am using a a fully connected network with 4 input and 2 output nodes. I store the Weights of my network after completely training it. Suppose here is my weight matrix
`W = np.array([[0.8,0.02],[0.5,0.4],[0.3,0.2],[0.1,0.7]])`
I want to visualize that what weights each class has adopted. How I can do that? I searched the codes related to this they are are using plt.imshow. Should I simply mention plt.imshow(W) to visualize weights?
You should use TensorBoard for this. Also, you should not need to store the weights manually, as they are stored by TensorFlow. You can access them in a couple of different ways, such as with tf.trainable_variables(), or tape.watched_variables() in eager mode. Then its just a matter of looping through the variables for the weights you want.
To plot your weights in TensorBoard, check out this: https://www.tensorflow.org/api_docs/python/tf/contrib/summary
My challenge is to train a neural network to recognize certain actions and events for different classes of task or how you want to call it given the input.
I see that most of the input/output when training neural networks is either 0 or 1 or [0,1]. But in my scenario I want my input to be in the form of integers which are arbitrarily big and the same form is expected for the output.
Let me give you an example:
Input
X = [ 23, 4, 0, 1233423, 1, 0, 0] ->
Y = [ 2, 1, 1]
Now each element in X[i] represent different properties of the same entity.
Let's say it want to describe a human being:
23 -> maps to a place he/she was born
4 -> maps to a school they graduated
etc.
Each entry in Y[i], on the other hand, means what is more likely the human to do in 3 different categories ( as len(Y) is 3 in this case ):
Y[0] = 2 -> maps to eating icecream ( from a variety of other choices )
Y[1] = 1 -> maps to a time of day moment ( morning, noon, afternoon, evening, etc...)
Y[2] = 1 -> maps to a day of the week for example
Now of course if the task was just a multi label problem I would apply a sigmoid on the output layer and do a binary_crossentropy as the loss function but that of course does not work.
Here because my output is obviously not between [0,1].
Also I am not really sure what loss function to apply since I want all classes/subclasses in Y to be correctly predicted. What I am basically saying is that each Y[i] is itself is a class of its own.
It would be more accurate if my output was in the shape of (3, labels_per_class)
and the loss function would calculate a loss for each of the 3 different classes
trying to optimize the result in such a way that each of the 3 classes would have the correct labels.
I am not sure if that is possible or how at least.
I am really still in the beginnings with my neural network knowledge and learning so clearly I am struggling with this problem.
But really to put it more simply I have a better idea how to describe it. It is more or less like an auto-encoder but the inputs and outputs are integers. The difference is that in my case the output has a different size from the input where in the auto-encoder they are the same.
My solution was to apply a relu at the output layer, ( and of course relu-like activations on all other layers as well ) and binary_crossentropy as the loss functions but the accuracy of the network is very low, around 15%.
For a standard classification you would probably do a dense layer with a number of nodes equal to the number of classes then apply softmax. The loss would be tf.losses.softmax_cross_entropy. You would do a sigmoid if you want to allow multiple classes, not just one.
Now you have multiple classification tasks. One way to do it is to take the last hidden layer (the one before the one where you do softmax). For each task do a dense layer with a number of nodes equals to the number of classes for that task and apply softmax. To compute the loss just add the losses together.
If the tasks are too different you may want to have more than one layer for each prediction.
You can also put some weights on the different losses if, say, eating ice-cream is a lot more important than getting the time of day right.
Only use relu if the prediction space is continous. Say time of day is continous but the choice between eating ice-cream, going to work, watching TV is not. If you use relu use a loss like L1(tf.losses.absolut_difference) or L2 (tf.losses.mean_squared_error).
I have a question regarding convolutional neural network (CNN) training.
I have managed to train a network using tensorflow that takes an input image (1600 pixels) and output one of three classes that matches it.
Testing the network with variations of the trained classes is giving good results. However; when I give it a different -fourth- image (does not contain any of the trained 3 image), it always returns a random match to one of the classes.
My question is, how can I train a network to classify that the image does not belong to either of the three trained images? A similar example, if i trained a network against the mnist database and then a gave it the character "A" or "B". Is there a way to discriminate that the input does not belong to either of the classes?
Thank you
Your model will always make predictions like your labels, so for example if you train your model with MNIST data, when you will make predictions, prediction will always be 0-9 just like MNIST labels.
What you can do is train a different model first with 2 classes in which you will predict if an image belongs to data set A or B. E.x. for MNIST data you label all data as 1 and add data from other sources that are different (not 0-9) and label them as 0. Then train a model to find if image belongs to MNIST or not.
Convolutional Neural Network (CNN) predicts the result from the defined classes after training. CNN always return from one of the classes regardless of accuracy. I have faced similar problem, what you can do is to check for accuracy value. If the accuracy is below some threshold value then it's belong to none category. Hope this helps.
You probably have three output nodes, and choose the maximum value (one-hot encoding). That's a bit unfortunate as it's a low number of outputs. Non-recognized inputs tend to cause pretty random outputs.
Now, with 3 outputs, roughly speaking you can get 7 outcomes. You might get a single high value (3 possibilities) but non-recognized input can also cause 2 high outputs (also 3 possibilities) or approximately equal output (also 3 possibilities). So there's a decent chance (~ 3/7) of random inputs producing a pattern on the output nodes which you'd only expect for a recognized input.
Now, if you had 15 classes and thus 15 output nodes, you'd be looking at roughly 32767 possible outcomes for unrecognized inputs, only 15 of which correspond to expected one-hot outcomes.
Underlying this is a lack of training data. If your training set has examples outside the 3 classes, you can just dump this in a 4th "other" category and train with that. This by itself isn't a reliable indication, as usually the theoretical "other" set is huge, but you now have 2 complementary ways of detecting other inputs: either by the "other" output node or by one of the 11 ambiguous outputs.
Another solution would be to check what outcome your CNN usually gives when given something else. I believe the last layer must be softmax and your CNN should return probabilities of the three given classes. If none of these probabilities is close to 1 this might be a sign that this is something else assuming your CNN is well trained (it must be fined for overconfidence when predicting wrong labels).