Simply switch output usage? - tensorflow

i got a game with only 10x2 pixels as input and it learns after one hour training doing it by itself. Now i want to use one float value output of the model instead of three classifier outputs. The three classifier outputs where stop,1-step right, 1-step-left. Now i want to produce one output value which tells me e.g. -4 => 4 steps-left, +2 => 2 steps-right and so on.
But after training for 1-2 hours, it only produces numbers around 0.001, but it should produce numbers between -10.0->+10.0 ?
Do i need todo it in a completly other way, or can i use an classifier model to output real value without changing much code ?
thanks for help
game code link

Training a classifier is much simpler than coming up with a good loss function that will give you scalaer values that make sense. Much (!) simpler.
Make it a classifier with 21 classes (0=10 left, 1=9 left, 2=8 left,...,10=stay, 11=1 right, ..., 20=10 right)

Related

Predict a nonlinear array based on 2 features with scalar values using XGBoost or equivalent

So I have been looking at XGBoost as a place to start with this, however I am not sure the best way to accomplish what I want.
My data is set up something like this
Where every value, whether it be input or output is numerical. The issue I'm facing is that I only have 3 input data points per several output data points.
I have seen that XGBoost has a multi-output regression method, however I am only really seeing it used to predict around 2 outputs per 1 input, whereas my data may have upwards of 50 output points needing to be predicted with only a handful of scalar input features.
I'd appreciate any ideas you may have.
For reference, I've been looking at mainly these two demos (they are the same idea just one is scikit and the other xgboost)
https://machinelearningmastery.com/multi-output-regression-models-with-python/
https://xgboost.readthedocs.io/en/stable/python/examples/multioutput_regression.html

Number of units in the last dense layer in case of binary classification

My question is related to this one here. I am using the cats and dogs dataset. So there are only these two outcomes. I found two implementations. First one uses:
tf.keras.layers.Dense(1)
as the last layer in the model
The second implementation uses:
layers.Dense(2)
Now, I don't understand this. What is correct here? Is it the same and why (I cannot see why this should be the same). Or what is the difference here? The first solution is modelling cat or dog, the second solution is modelling cat, dog, or any other? Why is this done if we only have cat and dog? Which solution should I take?
Both are correct. One is using binary classification and another one is using categorical classification. Let's try to find the differences.
Binary Classification: In this case, the output layer has only one neuron. From this single neuron output, you have to decide either it's a cat or a dog. You can set any threshold level to classify the output. Let's say cats are labeled as 0 and dogs are labeled as 1 and your threshold value is 0.5. So, if the output is greater than 0.5, then it's a dog because it's closer to 1 otherwise it's a cat. In this case, binary_crossentropy is being used for most of the cases.
Categorical Classification: The number of output layers are exactly the same as the number of classes. This time you're not allowed to label your data as 0 or 1. Label shape should be same as the output layer. In your case, your output layer has two neurons(for classes). You will have to label your data in the same way. To achieve this, you will have to encode your label data. We call this "one-hot-encode". the cats will be encoded as (1,0) and the dogs will be encoded as (0,1) for example. Now your prediction will have two floating-point numbers. If the first number is greater than the second, it's a cat otherwise it's a dog. We call this numbers - confidence score. Let's say, for a test image, your model predicted (0.70, 0.30). which means your model is 70% for confident that it's a cat and 30% confident that it's a dog. Please note that the value of the output layer completely depends on the activation of your layer. To know deeper, please read about activation functions.

Use of DeepExplainer to get shap values for an MLP model in Keras with tensorflow backend

I am playing around with DeepExplainer to get shap values for deep learning models. By following some tutorials I can get some results, i.e. what variables are pushing the model prediction from the base value, which is the average model output in training set.
I have around 5,000 observations along with 70 features. The performance of DeepExplainer is quite satisfactory. And my code is:
model0 = load_model(model_p+'health0.h5')
background = healthScaler.transform(train[healthFeatures])
e = shap.DeepExplainer(model0, background)
shap_values = e.shap_values(healthScaler.transform(test[healthFeatures]))
test2 = test[healthFeatures].copy()
test2[healthFeatures] = healthScaler.transform(test[healthFeatures])
shap.force_plot(e.expected_value[0], shap_values[0][947,:], test2.iloc[947,:])
And the plot is the following:
Here the base value is 0.012 (can also be seen through e.expected_value[0]) and very close to the output value which is 0.01.
At this point I have some questions:
1) The output value is not identical to the prediction gotten through model0.predict(test[healthFeatures])[947] = -0.103 How should I assess output value?
2) As can be seen, I am using whole training set as the background to approximate conditional expectations of SHAP values. What is the difference between using random samples from training set and entire set? Is it only related to performance issue?
Many thanks in advance!
Probably too late but stil a most common question that will benefit other begginers. To answer (1), the expected and out values will be different. the expected is, as the name suggest, is the avereage over the scores predicted by your model, e.g., if it was probability then it is the average of the probabilties that your model spits. For (2), as long as the backroung values are less then 5k, it wont change much, but if > 5k then your calculations will take days to finish.
See this (lines 21-25) for more comprehensive answers.

tensorflow: how to recognize untrained new letter

I just get a question. I want the system to return the -1 as unknown char for the new untrained letters. For example, if I have trained 1/2/3/4,when I test the char '5' or '6', the tensorflow should return -1 as unknown char.
Is it possible?
Thanks.
I'd think for simple classifications, you're looking for anything that has less than a certain confidence/score of being a known class.
To be fair, I've only use Keras on top of TensorFlow, so YMMV.
I'd just train it on the 4 categories you know, then when it classifies if the final results if the top one has less than a certain raw score/weight (let's say it classifies an unknown 7 as a 4, but with a mediocre score) treat it as a -1.
This might not work with every loss/objection function you train your model on, but should work MSE or categorical cross entropy if you can get the raw final weight.

Should my seq2seq RNN idea work?

I want to predict stock price.
Normally, people would feed the input as a sequence of stock prices.
Then they would feed the output as the same sequence but shifted to the left.
When testing, they would feed the output of the prediction into the next input timestep like this:
I have another idea, which is to fix the sequence length, for example 50 timesteps.
The input and output are exactly the same sequence.
When training, I replace last 3 elements of the input by zero to let the model know that I have no input for those timesteps.
When testing, I would feed the model a sequence of 50 elements. The last 3 are zeros. The predictions I care are the last 3 elements of the output.
Would this work or is there a flaw in this idea?
The main flaw of this idea is that it does not add anything to the model's learning, and it reduces its capacity, as you force your model to learn identity mapping for first 47 steps (50-3). Note, that providing 0 as inputs is equivalent of not providing input for an RNN, as zero input, after multiplying by a weight matrix is still zero, so the only source of information is bias and output from previous timestep - both are already there in the original formulation. Now second addon, where we have output for first 47 steps - there is nothing to be gained by learning the identity mapping, yet network will have to "pay the price" for it - it will need to use weights to encode this mapping in order not to be penalised.
So in short - yes, your idea will work, but it is nearly impossible to get better results this way as compared to the original approach (as you do not provide any new information, do not really modify learning dynamics, yet you limit capacity by requesting identity mapping to be learned per-step; especially that it is an extremely easy thing to learn, so gradient descent will discover this relation first, before even trying to "model the future").