Multiple target (large) neural network regression using Python - tensorflow

My situation is I have a excel file with 747 nodes (as input) each with a value (imagine 747 columns with floats) and I have an output of 741 values/columns with again floats. These are basically inputs and outputs of a geological simulation. So one row has 747(input)+741(output) = 1488 floats which is one dataset (from one simulation). I have 4 such datasets (rows) to train a neural network such that when I test them on 3 test datasets (747 columns) I get the output of 741 columns. This is just a simple run to get the skeleton of the neural network going before further modifications.
I have come across the Multi-Target Regression example of NYCTaxi (https://github.com/zeahmed/DeepLearningWithMLdotNet/tree/master/NYCTaxiMultiOutputRegression) but I can seem to wrap my head around it.
This is the training set (Input till and including column 'ABS', rest is output):
https://docs.google.com/spreadsheets/d/12TKVbGExt9KcK5RQKTexrToVo8qA5YfeItSaa7E2QdU/edit?usp=sharing
This is the test set:
https://docs.google.com/spreadsheets/d/1-RjyZsdguucCSOr9QTdTp2ehJBqWCr5yz1-aRjQ_4zo/edit?usp=sharing
This is the test Output (To validate) : https://docs.google.com/spreadsheets/d/10O_6711CEpJ4DN1w-kCmW01NikjFVZTDmNRuqO3U_6A/edit?usp=sharing
Any guidance/tips would be well appreciated. TIA!

We can use an AutoEncoder for this task. An AutoEncoder takes in the data, compresses it into a latent representation. Now, this representation vector is used to construct the output variable.
So, you can feed the 747-dimensional vector to the model and generate another 747-dimensional vector which is the output. After proper training, the model will be able to generate the target variables for a given set of inputs.

Related

How to format data for 1d CNN?

I have a dataset that I need to use with a 1d CNN, however I am not sure how to structure the data dimensions so that I can be used with the 1d CNN.
The data has 5 output classes, however, the input data is where I'm unsure how to proceed. Each output has a matrix of data associated with it, that is 16 x 8000. In other words, for every output I have, there is an associated matrix of numbers that must be fed together to reach that output. I have multiple of these data 16 x 1800 matrices for different samples, from which I am trying to make a prediction.
I was wondering how I can create a data frame for this that can be passed into a 1-d CNN?
More specifically, what would the input_shape parameter be set to in the model?
Right now, I am thinking that my output will be (# samples, 5). My input would be along the lines of ( # samples, (16 x 1800)), but I don't know how this would be implemented in keras?
Any help would be appreciated.

Siamese Twin Network: Merging of data streams with a custom function

since I am not very experienced I am struggling with a siamese twin network.
I have 2 images which run trough the same CNN and generate each a distinct feature vector. I would like to train a further network interpreting these two image vectors (each with 32 elements). In an intermediate step I would like to use these vectors as input for a function NCC which is located as a Layer between the CNN and the NN and defined in the following snippet ( i.e. the output should be used for the next NN):
def NCC(a, b):
l=a.shape[1]
av_a=tf.math.reduce_mean(a)
av_b=tf.math.reduce_mean(b)
a=a-av_a
b=b-av_b
norm_a=tf.math.sqrt(tf.math.reduce_sum(a*a))
norm_b=tf.math.sqrt(tf.math.reduce_sum(b*b))
a=a/norm_a
b=b/norm_b
A=tf.reshape(tf.repeat(a, axis=0, repeats=l),(l,l))
B=tf.reshape(tf.repeat(b, axis=0, repeats=l),(l,l))
ncc=Flatten()(A*tf.transpose(B))
return ncc
The output vector (for batchsize=1) should have a 32x32=1024 elements. It seems to work for a batchsize of 1. If I increase the batch size I run into trouble because the input vectors are now tensors with shape=(batch_size,32). I think this is a very stupid question- But how can I circumvent this issue? (It should be noted I wish also to have an output tensor with shape=(batch_size,1024))
Thanks in advance
Mike

Architecture of CNN to accept 2 sets of inputs

There are multiple examples how to build Tensorflow model to recognise cats and dogs from images. Now suppose I have audio associated with each picture and train separate network to recognise cats and dogs by sound.
I want to feed predictions of both networks into another layer to combine results and increase final prediction success rate.
How should my model look like?
Create two neural networks, that given a pair image-audio, you input each value to its corresponding net.
After the convolution steps or whatever you want to use, proceed as you would do with a normal CNN, in the last step before passing data to a FNN, when you flatten the data, do the same with the output of the audio NN.
So, as an example, if the output of the images one (flattened) has shape 2048 and the audio 4096 just append these two and make first layer of the FNN to have the sum of these shapes = 6144.

Subsection of grid as input to cnn

I have two huge grids (input and output) representing some spatial data of the same area. I want to be able to generate the output pixel-by-pixel by feeding a neural network a small part of the input grid, around the pixel of interest.
The naive way of training and evaluating on the CNN would be to extract sections separately, and giving those to the fit() function. But if the sub-grid the CNN operates on is e.g. a 256×256 area of the input, then I would copy each data point 65536 (!!!) times per epoch.
So is there any way to have karas just use subsections of a bigger data structure as training?
To me, this sounds a bit like training RNN's on sequencial sections of a data series, instead of copying each section separately.
The performance consideration is mainly in the case of evaluating the model. I want to use this model to generate output grid of a huge geographical area (denmark) with a resolution of 12,5 cm
It seems to me that you are looking for a fully convolutional network (FCN).
By using only layers that scale in size with their inputs (banishing the use of dense layers specifically), an FCN is able to produce an output with a spatial range that grows proportionally with that of the input — typically, the ouput has the same resolution as the input, as in your case.
If your inputs are very large, you can still train an FCN on subimages. Then for inference, you can
run the network on your entire image: indeed, sometimes the inputs are too big to be batched together during training, but can be feed alone for inference.
or split your input into subimages and tile the results back. In that case, I would probably use overlapping tiles to avoid potential border effects.
You can probably go well with a Sequence generator.
You will still have to create slices for each batch, but taking slices isn't slow at all compared with the CNN operations.
And by using a keras.utils.Sequence, the generation of the batches is parallel with the model's execution, so no penalty:
class GridGenerator(keras.utils.Sequence):
def __init__(self, originalGrid_maybeFileName, outputGrid, subGridSize):
self.originalGrid = originalGrid_maybeFileName
self.outputGrid = outputGrid
self.subgridSize = subgridSize
def __len__(self):
#naive implementation, if grids are squares and the sizes are multiples of each other
self.divs = self.originalGrid.shape[:,:,1] // self.subgridSize
return self.divs * self.divs
def __getitem__(self,i):
row, column = divmod(i, self.divs)
#using channels_last
x= self.originalGrid[:,row:row+self.subgridSize, column:column+self.subgridSize]
y= self.outputGrid[:,row:row+self.subgridSize, column:column+self.subgridSize]
return x,y
If the full grid doesn't fit your PC's memory, then you should find ways of loading parts of the grid at a time. (Use the generator to load these parts)
Create the generator and train with fit_generator:
generator = GridGenerator(xGrid, yGrid, subSize)
#you can create additional generators to take a part of that as training and another part as validation
model.fit_generator(generator, len(generator), ...., workers = 4)
The workers argument determines how many batches will be loaded in parallel before sent to the model.

Convolutional Neural Network Training

I have a question regarding convolutional neural network (CNN) training.
I have managed to train a network using tensorflow that takes an input image (1600 pixels) and output one of three classes that matches it.
Testing the network with variations of the trained classes is giving good results. However; when I give it a different -fourth- image (does not contain any of the trained 3 image), it always returns a random match to one of the classes.
My question is, how can I train a network to classify that the image does not belong to either of the three trained images? A similar example, if i trained a network against the mnist database and then a gave it the character "A" or "B". Is there a way to discriminate that the input does not belong to either of the classes?
Thank you
Your model will always make predictions like your labels, so for example if you train your model with MNIST data, when you will make predictions, prediction will always be 0-9 just like MNIST labels.
What you can do is train a different model first with 2 classes in which you will predict if an image belongs to data set A or B. E.x. for MNIST data you label all data as 1 and add data from other sources that are different (not 0-9) and label them as 0. Then train a model to find if image belongs to MNIST or not.
Convolutional Neural Network (CNN) predicts the result from the defined classes after training. CNN always return from one of the classes regardless of accuracy. I have faced similar problem, what you can do is to check for accuracy value. If the accuracy is below some threshold value then it's belong to none category. Hope this helps.
You probably have three output nodes, and choose the maximum value (one-hot encoding). That's a bit unfortunate as it's a low number of outputs. Non-recognized inputs tend to cause pretty random outputs.
Now, with 3 outputs, roughly speaking you can get 7 outcomes. You might get a single high value (3 possibilities) but non-recognized input can also cause 2 high outputs (also 3 possibilities) or approximately equal output (also 3 possibilities). So there's a decent chance (~ 3/7) of random inputs producing a pattern on the output nodes which you'd only expect for a recognized input.
Now, if you had 15 classes and thus 15 output nodes, you'd be looking at roughly 32767 possible outcomes for unrecognized inputs, only 15 of which correspond to expected one-hot outcomes.
Underlying this is a lack of training data. If your training set has examples outside the 3 classes, you can just dump this in a 4th "other" category and train with that. This by itself isn't a reliable indication, as usually the theoretical "other" set is huge, but you now have 2 complementary ways of detecting other inputs: either by the "other" output node or by one of the 11 ambiguous outputs.
Another solution would be to check what outcome your CNN usually gives when given something else. I believe the last layer must be softmax and your CNN should return probabilities of the three given classes. If none of these probabilities is close to 1 this might be a sign that this is something else assuming your CNN is well trained (it must be fined for overconfidence when predicting wrong labels).