Google Cloud ml-engine fails predicting multiple inputs - tensorflow

Predictions only successful when providing a single instance instance.json.
Test 1: Contents of instance.json:
{"serving_input": [20.0, 0.0, 1.0 ... 0.16474569041197143, 0.04138248072194471], "prediction_id": 0, "keep_prob": 1.0}
Prediction (same output for local and online prediction)
gcloud ml-engine local predict --model-dir=./model_dir --json-instances=instances.json
Output:
SERVING_OUTPUT ARGMAX PREDICTION_ID SCORES TOP_K
[-340.6920166015625, -1153.0877685546875] 0 0 [1.0, 0.0] [1.0, 0.0]
Test 2: Contents of instance.json:
{"serving_input": [20.0, 0.0, 1.0 ... 0.16474569041197143, 0.04138248072194471], "prediction_id": 0, "keep_prob": 1.0}
{"serving_input": [21.0, 2.0, 3.0 ... 3.14159265359, 0.04138248072194471], "prediction_id": 1, "keep_prob": 1.0}
Output:
.. Incompatible shapes: [2] vs. [2,108] .. (_arg_keep_prob_0_1, Model/dropout/random_uniform)
Where as 108 is the size of the first hidden layer(net_dim=[2015,108,2]). (Initialized with tf.nn.dropout, thus the keep_prob=1.0)
Exporting code:
probabilities = tf.nn.softmax(self.out_layer)
top_k, _ = tf.nn.top_k(probabilities, self.network_dim[-1])
prediction_signature = (
tf.saved_model.signature_def_utils.predict_signature_def(
inputs={'serving_input': self.x, 'keep_prob': self.keep_prob,
'prediction_id': self.prediction_id_in},
outputs={'serving_output': self.out_layer, 'argmax': tf.argmax(self.out_layer, 1),
'prediction_id': self.prediction_id_out, 'scores': probabilities, 'top_k': top_k}))
builder.add_meta_graph_and_variables(
sess,
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
prediction_signature
},
main_op=tf.saved_model.main_op.main_op())
builder.save()
How can i format the instance.json to perform a batched prediction? (Prediction with multiple input instances)

The problem is not in the JSON. Check to see how you are using self.x
I think that your code is assuming that it's a 1D array, when you should treat it as a tensor of shape [?, 108]

Related

TFlite: set_tensor() takes 3 positional arguments but 4 were given

I've written a simple program to calculate a quadratic equation with Tensorflow. Now, I'd like to transform the code for running on the Coral Dev Board by using Tensorflow lite.
The following code shows the generation of tflite-file:
# Define and compile the neural network
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd', loss='mean_squared_error')
# Provide the data
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)
# Generation TFLite Model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the TFLite-Model
with open('mobilenet_v2_1.0_224.tflite', 'wb') as f:
f.write(tflite_model)
This code runs on the Coral Dev Board:
# Load TFLite model and allocate tensors.
interpreter = tflite.Interpreter(model_path="mobilenet_v2_1.0_224.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on random input data.
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=np.float32)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], xs, ys)
...
The last codeline runs on error:
TypeError: set_tensor() takes 3 positional arguments but 4 were given
The output of 'input_details[0]['index']':
{'name': 'serving_default_dense_input:0',
'index': 0,
'shape': array([1, 1], dtype=int32),
'shape_signature': array([-1, 1], dtype=int32),
'dtype': <class 'numpy.float32'>,
'quantization': (0.0, 0),
'quantization_parameters':
{'scales': array([], dtype=float32),
'zero_points': array([], dtype=int32),
'quantized_dimension': 0},
'sparsity_parameters': {}
}
I' don't understand the cause of error. Has someone any idea?
You error is the following. You are passing a dictionary, to your set_tensor method.
That means when python, reads that line of code. It gives you a TypeError, since you are passing a interable with 2 concurrent values. So that is the why of you error!
Now to fix your code. First you need to understand that the set_tensor method, expects the index of the given tensor. What you are currently passing in the input_details[0]['index'] is something else entirely. What you want to pass is the index, of you tensor. Which is as your displayed data given by interpreter.get_input_details() showed is 0.
Also you are supposed to define the index of only one of the given data. Either the test data or the train data, not both at the same time. So eliminate either one of the xs or ys variables.
So just rewrite this line like this
interpreter.set_tensor(0, ys)
I hope this get right, usually is good to also take a look at documentation. So you understand what each method expects https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter#set_tensor
My approach was wrong. In Xs are the X-values and in Ys are the Y-values (result values) of the quadratic equation. I was not aware that you cannot do training in Tflite. But thanks for the effort anyway.

In Keras what is the difference between Conv2DTranspose and Conv2D

I'm currently building a GAN with Tensorflow 2 and Keras and noticed a lot of the existing Neural Networks for the generator and discriminator use Conv2D and Conv2DTranspose in Keras.
I'm struggling to find something that functionally explains the difference between the two. Can anyone explain what these two different options for making a NN in Keras mean?
Conv2D applies Convolutional operation on the input. On the contrary, Conv2DTranspose applies a Deconvolutional operation on the input.
For example:
x = tf.random.uniform((1,3,3,1))
conv2d = tf.keras.layers.Conv2D(1,2)(x)
print(conv2d.shape)
# (1, 2, 2, 1)
conv2dTranspose = tf.keras.layers.Conv2DTranspose(1,2)(x)
print(conv2dTranspose.shape)
# (1, 4, 4, 1)
Conv2D is mainly used when you want to detect features, e.g., in the encoder part of an autoencoder model, and it may shrink your input shape.
Conversely, Conv2DTranspose is used for creating features, for example, in the decoder part of an autoencoder model for constructing an image. As you can see in the above code, it makes the input shape larger.
For example:
kernel = tf.constant_initializer(1.)
x = tf.ones((1,3,3,1))
conv = tf.keras.layers.Conv2D(1,2, kernel_initializer=kernel)
y = tf.ones((1,2,2,1))
de_conv = tf.keras.layers.Conv2DTranspose(1,2, kernel_initializer=kernel)
conv_output = conv(x)
print("Convolution\n---------")
print("input shape:",x.shape)
print("output shape:",conv_output.shape)
print("input tensor:",np.squeeze(x.numpy()).tolist())
print("output tensor:",np.around(np.squeeze(conv_output.numpy())).tolist())
'''
Convolution
---------
input shape: (1, 3, 3, 1)
output shape: (1, 2, 2, 1)
input tensor: [[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]
output tensor: [[4.0, 4.0], [4.0, 4.0]]
'''
de_conv_output = de_conv(y)
print("De-Convolution\n------------")
print("input shape:",y.shape)
print("output shape:",de_conv_output.shape)
print("input tensor:",np.squeeze(y.numpy()).tolist())
print("output tensor:",np.around(np.squeeze(de_conv_output.numpy())).tolist())
'''
De-Convolution
------------
input shape: (1, 2, 2, 1)
output shape: (1, 3, 3, 1)
input tensor: [[1.0, 1.0], [1.0, 1.0]]
output tensor: [[1.0, 2.0, 1.0], [2.0, 4.0, 2.0], [1.0, 2.0, 1.0]]
'''
To sum up:
Conv2D:
May shrink your input
For detecting features
Conv2DTranspose:
Enlarges your input
For constructing features
And if you want to know how Conv2DTranspose enlarges input, here you go:

Connect custom input pipeline to tf model

I am currently trying to get a simple tensorflow model to train by data provided by a custom input pipeline. It should work as efficient as possible. Although I've read lots of tutorials, I can't get it to work.
THE DATA
I have my training data split over several csv files. File 'a.csv' has 20 samples and 'b.csv' has 30 samples in it, respectively. They have the same structure with the same header:
feature1; feature2; feature3; feature4
0.1; 0.2; 0.3; 0.4
...
(No labels, as it is for an autoencoder.)
THE CODE
I have written an input pipeline and would like to feed the data from it to the model. My code looks like this:
import tensorflow as tf
def input_pipeline(filenames, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.flat_map(
lambda filename: (
tf.data.TextLineDataset(filename)
.skip(1)
.shuffle(10)
.map(lambda csv_row: tf.decode_csv(
csv_row,
record_defaults=[[-1.0]]*4,
field_delim=';'))
.batch(batch_size)
)
)
return dataset.make_initializable_iterator()
iterator = input_pipeline(['/home/sku/data/a.csv',
'/home/sku/data/b.csv'],
batch_size=5)
next_element = iterator.get_next()
# Build the autoencoder
x = tf.placeholder(tf.float32, shape=[None, 4], name='in')
z = tf.contrib.layers.fully_connected(x, 2, activation_fn=tf.nn.relu)
x_hat = tf.contrib.layers.fully_connected(z, 4)
# loss function with epsilon for numeric stability
epsilon = 1e-10
loss = -tf.reduce_sum(
x * tf.log(epsilon + x_hat) + (1 - x) * tf.log(epsilon + 1 - x_hat))
train_op = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(loss)
with tf.Session() as sess:
sess.run(iterator.initializer)
sess.run(tf.global_variables_initializer())
for i in range(50):
batch = sess.run(next_element)
sess.run(train_op, feed_dict={x: batch, x_hat: batch})
THE PROBLEM
When trying to feed the data to the model, I get an error:
ValueError: Cannot feed value of shape (4, 5) for Tensor 'in:0', which has shape '(?, 4)'
When printing out the shapes of the batched data, I get this for example:
(array([ 4.1, 5.9, 5.5, 6.7, 10. ], dtype=float32), array([0.4, 7.7, 0. , 3.4, 8.7], dtype=float32), array([3.5, 4.9, 8.3, 7.2, 6.4], dtype=float32), array([-1. , -1. , 9.6, -1. , -1. ], dtype=float32))
It makes sense, but where and how do I have to reshape this? Also, this additional info dtype only appears with batching.
I also considered that I did the feeding wrong. Do I need input_fn or something like that? I remember that feeding dicts is way to slow. If somebody could give me an efficient way to prepare and feed the data, I would be really grateful.
Regards,
I've figured out a solution, that requires a second mapping function. You have to add the following line to the input function:
def input_pipeline(filenames, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.flat_map(
lambda filename: (
tf.data.TextLineDataset(filename)
.skip(1)
.shuffle(10)
.map(lambda csv_row: tf.decode_csv(
csv_row,
record_defaults=[[-1.0]]*4,
field_delim=';'))
.map(lambda *inputs: tf.stack(inputs)) # <-- mapping required
.batch(batch_size)
)
)
return dataset.make_initializable_iterator()
This seems to convert the array-like output to a matrix, that can be fed to the network.
However, I'm still not sure if feeding it via feed_dict is the most efficient way. I'd still appreciate support here!

How to properly use tf.metrics.accuracy?

I have some trouble using the accuracy function from tf.metrics for a multiple classification problem with logits as input.
My model output looks like:
logits = [[0.1, 0.5, 0.4],
[0.8, 0.1, 0.1],
[0.6, 0.3, 0.2]]
And my labels are one hot encoded vectors:
labels = [[0, 1, 0],
[1, 0, 0],
[0, 0, 1]]
When I try to do something like tf.metrics.accuracy(labels, logits) it never gives the correct result. I am obviously doing something wrong but I can't figure what it is.
TL;DR
The accuracy function tf.metrics.accuracy calculates how often predictions matches labels based on two local variables it creates: total and count, that are used to compute the frequency with which logits matches labels.
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1),
predictions=tf.argmax(logits,1))
print(sess.run([acc, acc_op]))
print(sess.run([acc]))
# Output
#[0.0, 0.66666669]
#[0.66666669]
acc (accuracy): simply returns the metrics using total and count, doesnt update the metrics.
acc_op (update up): updates the metrics.
To understand why the acc returns 0.0, go through the details below.
Details using a simple example:
logits = tf.placeholder(tf.int64, [2,3])
labels = tf.Variable([[0, 1, 0], [1, 0, 1]])
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1),
predictions=tf.argmax(logits,1))
Initialize the variables:
Since metrics.accuracy creates two local variables total and count, we need to call local_variables_initializer() to initialize them.
sess = tf.Session()
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
stream_vars = [i for i in tf.local_variables()]
print(stream_vars)
#[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>,
# <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]
Understanding update ops and accuracy calculation:
print('acc:',sess.run(acc, {logits:[[0,1,0],[1,0,1]]}))
#acc: 0.0
print('[total, count]:',sess.run(stream_vars))
#[total, count]: [0.0, 0.0]
The above returns 0.0 for accuracy as total and count are zeros, inspite of giving matching inputs.
print('ops:', sess.run(acc_op, {logits:[[0,1,0],[1,0,1]]}))
#ops: 1.0
print('[total, count]:',sess.run(stream_vars))
#[total, count]: [2.0, 2.0]
With the new inputs, the accuracy is calculated when the update op is called. Note: since all the logits and labels match, we get accuracy of 1.0 and the local variables total and count actually give total correctly predicted and the total comparisons made.
Now we call accuracy with the new inputs (not the update ops):
print('acc:', sess.run(acc,{logits:[[1,0,0],[0,1,0]]}))
#acc: 1.0
Accuracy call doesnt update the metrics with the new inputs, it just returns the value using the two local variables. Note: the logits and labels dont match in this case. Now calling update ops again:
print('op:',sess.run(acc_op,{logits:[[0,1,0],[0,1,0]]}))
#op: 0.75
print('[total, count]:',sess.run(stream_vars))
#[total, count]: [3.0, 4.0]
The metrics are updated to new inputs
For more information on how to use the metrics during training and how to reset them during validation, can be found here.
On TF 2.0, if you are using the tf.keras API, you can define a custom class myAccuracy which inherits from tf.keras.metrics.Accuracy, and overrides the update method like this:
# imports
# ...
class myAccuracy(tf.keras.metrics.Accuracy):
def update_state(self, y_true, y_pred, sample_weight=None):
y_true = tf.argmax(y_true,1)
y_pred = tf.argmax(y_pred,1)
return super(myAccuracy,self).update_state(y_true,y_pred,sample_weight)
Then, when compiling the model you can add metrics in the usual way.
from my_awesome_models import discriminador
discriminador.compile(tf.keras.optimizers.Adam(),
loss=tf.nn.softmax_cross_entropy_with_logits,
metrics=[myAccuracy()])
from my_puzzling_datasets import train_dataset,test_dataset
discriminador.fit(train_dataset.shuffle(70000).repeat().batch(1000),
epochs=1,steps_per_epoch=1,
validation_data=test_dataset.shuffle(70000).batch(1000),
validation_steps=1)
# Train for 1 steps, validate for 1 steps
# 1/1 [==============================] - 3s 3s/step - loss: 0.1502 - accuracy: 0.9490 - val_loss: 0.1374 - val_accuracy: 0.9550
Or evaluate yout model over the whole dataset
discriminador.evaluate(test_dataset.batch(TST_DSET_LENGTH))
#> [0.131587415933609, 0.95354694]
Applied on a cnn you can write:
x_len=24*24
y_len=2
x = tf.placeholder(tf.float32, shape=[None, x_len], name='input')
fc1 = ... # cnn's fully connected layer
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
layer_fc_dropout = tf.nn.dropout(fc1, keep_prob, name='dropout')
y_pred = tf.nn.softmax(fc1, name='output')
logits = tf.argmax(y_pred, axis=1)
y_true = tf.placeholder(tf.float32, shape=[None, y_len], name='y_true')
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(y_true, axis=1), predictions=tf.argmax(y_pred, 1))
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
def print_accuracy(x_data, y_data, dropout=1.0):
accuracy = sess.run(acc_op, feed_dict = {y_true: y_data, x: x_data, keep_prob: dropout})
print('Accuracy: ', accuracy)
Extending the answer to TF2.0, the tutorial here explains clearly how to use tf.metrics for accuracy and loss.
https://www.tensorflow.org/beta/tutorials/quickstart/advanced
Notice that it mentions that the metrics are reset after each epoch :
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
When label and predictions are one-hot-coded
def train_step(features, labels):
with tf.GradientTape() as tape:
prediction = model(features)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=predictions))
gradients = tape.gradient(loss, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
train_loss(loss)
train_accuracy(tf.argmax(labels, 1), tf.argmax(predictions, 1))
Here how I use it:
test_accuracy = tf.keras.metrics.Accuracy()
# use dataset api or normal dataset from lists/np arrays
ds_test_batch = zip(x_test,y_test)
predicted_classes = np.array([])
for (x, y) in ds_test_batch:
# training=False is needed only if there are layers with different
# behaviour during training versus inference (e.g. Dropout).
#Ajust the input similar to your input during the training
logits = model(x.reshape(1,-1), training=False )
prediction = tf.argmax(logits, axis=1, output_type=tf.int64)
predicted_classes = np.concatenate([predicted_classes,prediction.numpy()])
test_accuracy(prediction, y)
print("Test set accuracy: {:.3%}".format(test_accuracy.result()))

Return all possible prediction values

This neural network trains on inputs [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]] with labelled outputs : [[0.0], [1.0], [1.0], [0.0]]
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
sess.run(init)
# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 2])
# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 1])
# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4
weights_1 = tf.Variable(tf.truncated_normal([2, HIDDEN_UNITS]))
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))
# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.sigmoid(tf.matmul(inputs, weights_1) + biases_1)
print layer_1_outputs
NUMBER_OUTPUT_NEURONS = 1
biases_2 = tf.Variable(tf.zeros([NUMBER_OUTPUT_NEURONS]))
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, NUMBER_OUTPUT_NEURONS]))
finalLayerOutputs = tf.nn.sigmoid(tf.matmul(layer_1_outputs, weights_2) + biases_2)
tf.global_variables_initializer().run()
logits = tf.nn.sigmoid(tf.matmul(layer_1_outputs, weights_2) + biases_2)
training_inputs = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
training_outputs = [[0.0], [1.0], [1.0], [0.0]]
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)
for i in range(15):
_, loss = sess.run([train_step, error_function],
feed_dict={inputs: np.array(training_inputs),
desired_outputs: np.array(training_outputs)})
print(sess.run(logits, feed_dict={inputs: np.array([[0.0, 1.0]])}))
Upon training this network returns [[ 0.61094815]] for values [[0.0, 1.0]]
[[ 0.61094815]] is value with highest probability after training this network is assign to input value [[0.0, 1.0]] ? Can the lower probability values also be accessed and not just most probable ?
If I increase number of training epochs I'll get better prediction but in this case I just want to access all potential values with their probabilities for a given input.
Update :
Have updated code to use multi class classification with softmax. But the prediction for [[0.0, 1.0, 0.0, 0.0]] is [array([0])]. Have I updated correctly ?
import numpy as np
import tensorflow as tf
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 4])
# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 3])
# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4
weights_1 = tf.Variable(tf.truncated_normal([4, HIDDEN_UNITS]))
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))
# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.softmax(tf.matmul(inputs, weights_1) + biases_1)
biases_2 = tf.Variable(tf.zeros([3]))
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, 3]))
finalLayerOutputs = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
tf.global_variables_initializer().run()
logits = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
training_inputs = [[0.0, 0.0 , 0.0, 0.0], [0.0, 1.0 , 0.0, 0.0], [1.0, 0.0 , 0.0, 0.0], [1.0, 1.0 , 0.0, 0.0]]
training_outputs = [[0.0,0.0,0.0], [1.0,0.0,0.0], [1.0,0.0,0.0], [0.0,0.0,1.0]]
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)
for i in range(15):
_, loss = sess.run([train_step, error_function],
feed_dict={inputs: np.array(training_inputs),
desired_outputs: np.array(training_outputs)})
prediction=tf.argmax(logits,1)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
Which prints [array([0])]
Update 2 :
Replacing
prediction=tf.argmax(logits,1)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
With :
prediction=tf.nn.softmax(logits)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
Appears to fix issue.
So now full source is :
import numpy as np
import tensorflow as tf
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 4])
# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 3])
# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4
weights_1 = tf.Variable(tf.truncated_normal([4, HIDDEN_UNITS]))
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))
# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.softmax(tf.matmul(inputs, weights_1) + biases_1)
biases_2 = tf.Variable(tf.zeros([3]))
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, 3]))
finalLayerOutputs = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
tf.global_variables_initializer().run()
logits = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
training_inputs = [[0.0, 0.0 , 0.0, 0.0], [0.0, 1.0 , 0.0, 0.0], [1.0, 0.0 , 0.0, 0.0], [1.0, 1.0 , 0.0, 0.0]]
training_outputs = [[0.0,0.0,0.0], [1.0,0.0,0.0], [1.0,0.0,0.0], [0.0,0.0,1.0]]
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)
for i in range(1500):
_, loss = sess.run([train_step, error_function],
feed_dict={inputs: np.array(training_inputs),
desired_outputs: np.array(training_outputs)})
prediction=tf.nn.softmax(logits)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
Which prints
[array([[ 0.49810624, 0.24845563, 0.25343812]], dtype=float32)]
Your current network does (logistic) regression, not really classification: given an input x, it tries to evaluate f(x) (where f(x) = x1 XOR x2 here, but the network does not know that before training), which is regression. To do so, it learns a function f1(x) and tries to have it be as close to f(x) on all your training samples. [[ 0.61094815]] is just the value of f1([[0.0, 1.0]]). In this setting, there is no such thing as "probability to be in a class", since there is no class. There is only the user (you) chosing to interpret f1(x) as a probability for the output to be 1. Since you have only 2 classes, that tells you that the probability of the other class is 1-0.61094815 (that is, you're doing classification with the output of the network, but it is not really trained to do that in itself). This method used as classification is, in a way, a (widely used) trick to perform classification, but only works if you have 2 classes.
A real network for classification would be built a bit differently: your logits would be of shape (batch_size, number_of_classes) - so (1, 2) in your case-, you apply a sofmax on them, and then the prediction is argmax(softmax), with probability max(softmax). Then you can also get the probability of each output, according to the network: probability(class i) = softmax[i]. Here the network is really trained to learn the probability of x being in each class.
I'm sorry if my explanation is obscure or if the difference between regression between 0 and 1 and classification seems philosophical in a setting with 2 classes, but if you add more classes you'll probably see what I mean.
EDIT
Answer to your 2 updates.
in your training samples, the labels (training_outputs) must be probability distributions, i.e. they must have sum 1 for each sample (99% of the time they are of the form (1, 0, 0), (0, 1, 0) or (0, 0, 1)), so your first output [0.0,0.0,0.0] is not valid. If you want to learn XOR on the two first inputs, then the 1st output should be the same as the last: [0.0,0.0,1.0].
prediction=tf.argmax(logits,1) = [array([0])] is completely normal: loginscontains your probabilities, and prediction is the prediction, which is the class with the biggest probability, which is in your case class 0: in your training set, [0.0, 1.0, 0.0, 0.0] is associated with output [1.0, 0.0, 0.0], i.e. it is of class 0 with probability 1, and of the other classes with probability 0. After enough training, print(best) with prediction=tf.argmax(logits,1) on input [1.0, 1.0 , 0.0, 0.0] should give you [array([2])], 2 being the index of the class for this input in your training set.