I'm building a model using Tensorflow where the input is a slice of the output. Think of the output layer as a 2D array. One row of that array is the input data. The neural network currently tries to connect the input to the output using a mean-squared error loss function. It's doing a fairly good job, but the accuracy needs to be improved a little.
To do that, I'm trying to add another physics-based loss function. If I can have the network place the input slice in its correct location in the output, that would greatly simplify the problem as each row in the output 2D array depends on the two rows above it.
I hope this makes sense.
Related
In my Keras custom loss function I would like to know the sample indexes (as in the original input array) for the current y_true, y_pred tensors.
I know it sounds weird, but for calculating loss I need some additional information, what I prepare in an external array, which is not part neither of the input array, nor the expected output array.
The only solution I currently see is to include it to the expected output array as additional columns, so I got it in y_true, but I am not sure how disturbing it would be for the NN and the optimizer to have one extra node in the output layer, which's actual prediction is not correlated with the calculated loss...
I am working on a research population by country based on this data set:
https://www.kaggle.com/tanuprabhu/population-by-country-2020
I learned that it's best practice to normalize the dataset before training, so I normalized the data using sklearn.preprocessing MinMaxScaler. I proceeded to train the model using the normalized dataset before saving the model.
Next, I wanted to perform predictions on new data. So I created an input file with a similar format to the training dataset. The new input data has only 2 rows (versus the training dataset which has 200 rows).
The problem that I encounter is, due to a small number of data in the new dataset, the minmaxscaler returned 1 and 0. 1 is for the bigger number, and 0 for the smaller number. When I feed this input into the model, it gave me a prediction that is too far off from the expected value.
I have also tried to apply mixmaxscaler to the new data, feed into the model, and then inverse the result. Still, I got a value that is too far from the expected value.
I have also tried to train the model without applying mixmaxscalar. I got a better result in this model, but the predicted result only respond very well when I changed certain columns with bigger values. The columns with smaller values don't have a very good response, while in real world I know that this factor is quite significant to the predicted result.
Where do I went wrong?
Any sample code on handling the input for the trained model is much appreciated.
To test what is going on I suggest that you take a row of your training data prior to scaling it. Apply the scalar and then use the result as the data for a prediction. You should get the same predicted result as the train data result value. When you apply the scalar look to see if it generates the same values as present in the training data for that row. Make sure you are using the scalar that was fit to the training set. Do not fit the scalar to the new data, just use it to transform the data.
I was trying to visualize the output of all the activation functions in each layers of my fully-connected network and I was surprised when I checked the last layer. It is a very simple regression model and the output layer has therefore one neuron. I know it would be better to visualize it as a scalar, but I was just trying to visualize it using histograms and I realized that the value is somehow split. Does this make any sense? I would rather expect that it cannot be visualized at all or that the histogram would consist of a single point.
The classification task is based on a image and a scalar value.
If I encoded the scalar value as image pixels with that value (or a normalized version of the same) and append it as another layer in the input image, I would be wasting convolutional computation cycles over the encoding to get this information into the network.
On the other hand, I can send this as another neuron to the layer where flattening of conved feature maps occurs. Another option would be adding just before the output layer. (But how do I implement such a network in Keras or tensorflow?)
Which is the best method to send in scalar values?
PS: Although this question is not specific to any framework, Keras examples would be great in a way that they are simple enough for most people to understand... Links to blogs addressing the same are welcome too.
See this question and answer on the Cross Validated site: Combining image and scalar inputs into a neural network
In addition to the "bias" method suggested in the paper mentioned there(when the scalar is being used as a bias to some convolution layer), and the other option suggested in the answer to append the scalar to some flattened layer, you can also use an inner product (fully connected, "Dense" in Keras) layer to find the connectivity pattern between the ND input to the scalar.
I am confused about what dynamic RNN (i.e. dynamic_rnn) is. It returns an output and a state in TensorFlow. What are these state and output? What is dynamic in a dynamic RNN, in TensorFlow?
Dynamic RNN's allow for variable sequence lengths. You might have an input shape (batch_size, max_sequence_length), but this will allow you to run the RNN for the correct number of time steps on those sequences that are shorter than max_sequence_length.
In contrast, there are static RNNs, which expect to run the entire fixed RNN length. There are cases where you might prefer to do this, such as if you are padding your inputs to max_sequence_length anyway.
In short, dynamic_rnn is usually what you want for variable length sequential data. It has a sequence_length parameter, and it is your friend.
While AlexDelPiero's answer was what I was googling for, the original question was different. You can take a look at this detailed description about LSTMs and intuition behind them. LSTM is the most common example of an RNN.
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
The short answer is: the state is an internal detail that is passed from one timestep to another. The output is a tensor of outputs on each timestep. You usually need to pass all outputs to the next RNN layer or the last output for the last RNN layer. To get the last output you can use output[:,-1,:]