sequence_size = [4, 2, 3] ### batch_size:4 num_steps:2 embedding_size: 3
num_units = 2
dummy_sequences = np.array([[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]])
fw_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
bw_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
inputs = tf.placeholder(dtype=tf.float64, shape=sequence_size)
encoder_outputs, encoder_state = tf.nn.bidirectional_dynamic_rnn(cell_fw=fw_cell, cell_bw=bw_cell,
inputs=inputs, sequence_length=sequence_size,
dtype=tf.float64)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output, state = sess.run([encoder_outputs, encoder_state], feed_dict={inputs: dummy_sequences})
print(output, state)
I coded an example to test the usage of rnn in tensorflow, and I encountered a problem with the parameter sequence_length. If I remove the parameter sequence_length, the code will run correctly. So, what is the correct way to set sequence_length. It confused me a lot because I have already set sequence_length in the order of batch_size, num_steps and embedding_size. Thanks a lot for your answer.
And the error is as fllowing:
ValueError: Dimension 0 in both shapes must be equal, but are 3 and 4 for 'bidirectional_rnn/fw/fw/while/Select' (op: 'Select') with input shapes: [3], [?,2], [4,2].
Parameter sequence_length expects the number of timesteps each sample should be processed for - num_steps in your example above. sequence_length must be a vector of length batch_size.
It is not the shape of input. Suppose your input is as in dummy_sequence that is [4,2,3]. You have 4 samples, 2 timesteps long, each represented by 3 values.
Hence your sequence_length is [2,2,2,2]. In case all samples are of the same length you can omit this parameter. Otherwise, the network will output zero-vector output for each timestep of each sample after maximum timestep for that sample has been reached.
Related
I've written a simple program to calculate a quadratic equation with Tensorflow. Now, I'd like to transform the code for running on the Coral Dev Board by using Tensorflow lite.
The following code shows the generation of tflite-file:
# Define and compile the neural network
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd', loss='mean_squared_error')
# Provide the data
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)
# Generation TFLite Model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the TFLite-Model
with open('mobilenet_v2_1.0_224.tflite', 'wb') as f:
f.write(tflite_model)
This code runs on the Coral Dev Board:
# Load TFLite model and allocate tensors.
interpreter = tflite.Interpreter(model_path="mobilenet_v2_1.0_224.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on random input data.
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=np.float32)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], xs, ys)
...
The last codeline runs on error:
TypeError: set_tensor() takes 3 positional arguments but 4 were given
The output of 'input_details[0]['index']':
{'name': 'serving_default_dense_input:0',
'index': 0,
'shape': array([1, 1], dtype=int32),
'shape_signature': array([-1, 1], dtype=int32),
'dtype': <class 'numpy.float32'>,
'quantization': (0.0, 0),
'quantization_parameters':
{'scales': array([], dtype=float32),
'zero_points': array([], dtype=int32),
'quantized_dimension': 0},
'sparsity_parameters': {}
}
I' don't understand the cause of error. Has someone any idea?
You error is the following. You are passing a dictionary, to your set_tensor method.
That means when python, reads that line of code. It gives you a TypeError, since you are passing a interable with 2 concurrent values. So that is the why of you error!
Now to fix your code. First you need to understand that the set_tensor method, expects the index of the given tensor. What you are currently passing in the input_details[0]['index'] is something else entirely. What you want to pass is the index, of you tensor. Which is as your displayed data given by interpreter.get_input_details() showed is 0.
Also you are supposed to define the index of only one of the given data. Either the test data or the train data, not both at the same time. So eliminate either one of the xs or ys variables.
So just rewrite this line like this
interpreter.set_tensor(0, ys)
I hope this get right, usually is good to also take a look at documentation. So you understand what each method expects https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter#set_tensor
My approach was wrong. In Xs are the X-values and in Ys are the Y-values (result values) of the quadratic equation. I was not aware that you cannot do training in Tflite. But thanks for the effort anyway.
I'm currently building a GAN with Tensorflow 2 and Keras and noticed a lot of the existing Neural Networks for the generator and discriminator use Conv2D and Conv2DTranspose in Keras.
I'm struggling to find something that functionally explains the difference between the two. Can anyone explain what these two different options for making a NN in Keras mean?
Conv2D applies Convolutional operation on the input. On the contrary, Conv2DTranspose applies a Deconvolutional operation on the input.
For example:
x = tf.random.uniform((1,3,3,1))
conv2d = tf.keras.layers.Conv2D(1,2)(x)
print(conv2d.shape)
# (1, 2, 2, 1)
conv2dTranspose = tf.keras.layers.Conv2DTranspose(1,2)(x)
print(conv2dTranspose.shape)
# (1, 4, 4, 1)
Conv2D is mainly used when you want to detect features, e.g., in the encoder part of an autoencoder model, and it may shrink your input shape.
Conversely, Conv2DTranspose is used for creating features, for example, in the decoder part of an autoencoder model for constructing an image. As you can see in the above code, it makes the input shape larger.
For example:
kernel = tf.constant_initializer(1.)
x = tf.ones((1,3,3,1))
conv = tf.keras.layers.Conv2D(1,2, kernel_initializer=kernel)
y = tf.ones((1,2,2,1))
de_conv = tf.keras.layers.Conv2DTranspose(1,2, kernel_initializer=kernel)
conv_output = conv(x)
print("Convolution\n---------")
print("input shape:",x.shape)
print("output shape:",conv_output.shape)
print("input tensor:",np.squeeze(x.numpy()).tolist())
print("output tensor:",np.around(np.squeeze(conv_output.numpy())).tolist())
'''
Convolution
---------
input shape: (1, 3, 3, 1)
output shape: (1, 2, 2, 1)
input tensor: [[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]
output tensor: [[4.0, 4.0], [4.0, 4.0]]
'''
de_conv_output = de_conv(y)
print("De-Convolution\n------------")
print("input shape:",y.shape)
print("output shape:",de_conv_output.shape)
print("input tensor:",np.squeeze(y.numpy()).tolist())
print("output tensor:",np.around(np.squeeze(de_conv_output.numpy())).tolist())
'''
De-Convolution
------------
input shape: (1, 2, 2, 1)
output shape: (1, 3, 3, 1)
input tensor: [[1.0, 1.0], [1.0, 1.0]]
output tensor: [[1.0, 2.0, 1.0], [2.0, 4.0, 2.0], [1.0, 2.0, 1.0]]
'''
To sum up:
Conv2D:
May shrink your input
For detecting features
Conv2DTranspose:
Enlarges your input
For constructing features
And if you want to know how Conv2DTranspose enlarges input, here you go:
Predictions only successful when providing a single instance instance.json.
Test 1: Contents of instance.json:
{"serving_input": [20.0, 0.0, 1.0 ... 0.16474569041197143, 0.04138248072194471], "prediction_id": 0, "keep_prob": 1.0}
Prediction (same output for local and online prediction)
gcloud ml-engine local predict --model-dir=./model_dir --json-instances=instances.json
Output:
SERVING_OUTPUT ARGMAX PREDICTION_ID SCORES TOP_K
[-340.6920166015625, -1153.0877685546875] 0 0 [1.0, 0.0] [1.0, 0.0]
Test 2: Contents of instance.json:
{"serving_input": [20.0, 0.0, 1.0 ... 0.16474569041197143, 0.04138248072194471], "prediction_id": 0, "keep_prob": 1.0}
{"serving_input": [21.0, 2.0, 3.0 ... 3.14159265359, 0.04138248072194471], "prediction_id": 1, "keep_prob": 1.0}
Output:
.. Incompatible shapes: [2] vs. [2,108] .. (_arg_keep_prob_0_1, Model/dropout/random_uniform)
Where as 108 is the size of the first hidden layer(net_dim=[2015,108,2]). (Initialized with tf.nn.dropout, thus the keep_prob=1.0)
Exporting code:
probabilities = tf.nn.softmax(self.out_layer)
top_k, _ = tf.nn.top_k(probabilities, self.network_dim[-1])
prediction_signature = (
tf.saved_model.signature_def_utils.predict_signature_def(
inputs={'serving_input': self.x, 'keep_prob': self.keep_prob,
'prediction_id': self.prediction_id_in},
outputs={'serving_output': self.out_layer, 'argmax': tf.argmax(self.out_layer, 1),
'prediction_id': self.prediction_id_out, 'scores': probabilities, 'top_k': top_k}))
builder.add_meta_graph_and_variables(
sess,
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
prediction_signature
},
main_op=tf.saved_model.main_op.main_op())
builder.save()
How can i format the instance.json to perform a batched prediction? (Prediction with multiple input instances)
The problem is not in the JSON. Check to see how you are using self.x
I think that your code is assuming that it's a 1D array, when you should treat it as a tensor of shape [?, 108]
How can the following be achieved in tensorflow, when A is a tf.SparseTensor and b is a tf.Variable?
A = np.arange(5**2).reshape((5,5))
b = np.array([1.0, 2.0, 0.0, 0.0, 1.0])
C = A * b
If I try the same notation, I get InvalidArgumentError: Provided indices are out-of-bounds w.r.t. dense side with broadcasted shape.
* works for SparseTensor as well, your problem seems to be with related to the SparseTensor itself, you might have provided the indices that are out of the range of the shape you give to it, consider this example:
A_t = tf.SparseTensor(indices=[[0,6],[4,4]], values=[3.2,5.1], dense_shape=(5,5))
Notice the column index 6 is larger than the shape specified which should have maximum 5 columns, and this gives the same error as you've shown:
b = np.array([1.0, 2.0, 0.0, 0.0, 1.0])
B_t = tf.Variable(b, dtype=tf.float32)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(A_t * B_t))
InvalidArgumentError (see above for traceback): Provided indices are
out-of-bounds w.r.t. dense side with broadcasted shape
Here is a working example:
A_t = tf.SparseTensor(indices=[[0,3],[4,4]], values=[3.2,5.1], dense_shape=(5,5))
b = np.array([1.0, 2.0, 0.0, 0.0, 1.0])
B_t = tf.Variable(b, dtype=tf.float32)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(A_t * B_t))
# SparseTensorValue(indices=array([[0, 3],
# [4, 4]], dtype=int64), values=array([ 0. , 5.0999999], dtype=float32), dense_shape=array([5, 5], dtype=int64))
This neural network trains on inputs [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]] with labelled outputs : [[0.0], [1.0], [1.0], [0.0]]
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
sess.run(init)
# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 2])
# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 1])
# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4
weights_1 = tf.Variable(tf.truncated_normal([2, HIDDEN_UNITS]))
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))
# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.sigmoid(tf.matmul(inputs, weights_1) + biases_1)
print layer_1_outputs
NUMBER_OUTPUT_NEURONS = 1
biases_2 = tf.Variable(tf.zeros([NUMBER_OUTPUT_NEURONS]))
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, NUMBER_OUTPUT_NEURONS]))
finalLayerOutputs = tf.nn.sigmoid(tf.matmul(layer_1_outputs, weights_2) + biases_2)
tf.global_variables_initializer().run()
logits = tf.nn.sigmoid(tf.matmul(layer_1_outputs, weights_2) + biases_2)
training_inputs = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
training_outputs = [[0.0], [1.0], [1.0], [0.0]]
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)
for i in range(15):
_, loss = sess.run([train_step, error_function],
feed_dict={inputs: np.array(training_inputs),
desired_outputs: np.array(training_outputs)})
print(sess.run(logits, feed_dict={inputs: np.array([[0.0, 1.0]])}))
Upon training this network returns [[ 0.61094815]] for values [[0.0, 1.0]]
[[ 0.61094815]] is value with highest probability after training this network is assign to input value [[0.0, 1.0]] ? Can the lower probability values also be accessed and not just most probable ?
If I increase number of training epochs I'll get better prediction but in this case I just want to access all potential values with their probabilities for a given input.
Update :
Have updated code to use multi class classification with softmax. But the prediction for [[0.0, 1.0, 0.0, 0.0]] is [array([0])]. Have I updated correctly ?
import numpy as np
import tensorflow as tf
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 4])
# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 3])
# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4
weights_1 = tf.Variable(tf.truncated_normal([4, HIDDEN_UNITS]))
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))
# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.softmax(tf.matmul(inputs, weights_1) + biases_1)
biases_2 = tf.Variable(tf.zeros([3]))
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, 3]))
finalLayerOutputs = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
tf.global_variables_initializer().run()
logits = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
training_inputs = [[0.0, 0.0 , 0.0, 0.0], [0.0, 1.0 , 0.0, 0.0], [1.0, 0.0 , 0.0, 0.0], [1.0, 1.0 , 0.0, 0.0]]
training_outputs = [[0.0,0.0,0.0], [1.0,0.0,0.0], [1.0,0.0,0.0], [0.0,0.0,1.0]]
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)
for i in range(15):
_, loss = sess.run([train_step, error_function],
feed_dict={inputs: np.array(training_inputs),
desired_outputs: np.array(training_outputs)})
prediction=tf.argmax(logits,1)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
Which prints [array([0])]
Update 2 :
Replacing
prediction=tf.argmax(logits,1)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
With :
prediction=tf.nn.softmax(logits)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
Appears to fix issue.
So now full source is :
import numpy as np
import tensorflow as tf
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 4])
# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 3])
# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4
weights_1 = tf.Variable(tf.truncated_normal([4, HIDDEN_UNITS]))
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))
# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.softmax(tf.matmul(inputs, weights_1) + biases_1)
biases_2 = tf.Variable(tf.zeros([3]))
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, 3]))
finalLayerOutputs = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
tf.global_variables_initializer().run()
logits = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
training_inputs = [[0.0, 0.0 , 0.0, 0.0], [0.0, 1.0 , 0.0, 0.0], [1.0, 0.0 , 0.0, 0.0], [1.0, 1.0 , 0.0, 0.0]]
training_outputs = [[0.0,0.0,0.0], [1.0,0.0,0.0], [1.0,0.0,0.0], [0.0,0.0,1.0]]
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)
for i in range(1500):
_, loss = sess.run([train_step, error_function],
feed_dict={inputs: np.array(training_inputs),
desired_outputs: np.array(training_outputs)})
prediction=tf.nn.softmax(logits)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
Which prints
[array([[ 0.49810624, 0.24845563, 0.25343812]], dtype=float32)]
Your current network does (logistic) regression, not really classification: given an input x, it tries to evaluate f(x) (where f(x) = x1 XOR x2 here, but the network does not know that before training), which is regression. To do so, it learns a function f1(x) and tries to have it be as close to f(x) on all your training samples. [[ 0.61094815]] is just the value of f1([[0.0, 1.0]]). In this setting, there is no such thing as "probability to be in a class", since there is no class. There is only the user (you) chosing to interpret f1(x) as a probability for the output to be 1. Since you have only 2 classes, that tells you that the probability of the other class is 1-0.61094815 (that is, you're doing classification with the output of the network, but it is not really trained to do that in itself). This method used as classification is, in a way, a (widely used) trick to perform classification, but only works if you have 2 classes.
A real network for classification would be built a bit differently: your logits would be of shape (batch_size, number_of_classes) - so (1, 2) in your case-, you apply a sofmax on them, and then the prediction is argmax(softmax), with probability max(softmax). Then you can also get the probability of each output, according to the network: probability(class i) = softmax[i]. Here the network is really trained to learn the probability of x being in each class.
I'm sorry if my explanation is obscure or if the difference between regression between 0 and 1 and classification seems philosophical in a setting with 2 classes, but if you add more classes you'll probably see what I mean.
EDIT
Answer to your 2 updates.
in your training samples, the labels (training_outputs) must be probability distributions, i.e. they must have sum 1 for each sample (99% of the time they are of the form (1, 0, 0), (0, 1, 0) or (0, 0, 1)), so your first output [0.0,0.0,0.0] is not valid. If you want to learn XOR on the two first inputs, then the 1st output should be the same as the last: [0.0,0.0,1.0].
prediction=tf.argmax(logits,1) = [array([0])] is completely normal: loginscontains your probabilities, and prediction is the prediction, which is the class with the biggest probability, which is in your case class 0: in your training set, [0.0, 1.0, 0.0, 0.0] is associated with output [1.0, 0.0, 0.0], i.e. it is of class 0 with probability 1, and of the other classes with probability 0. After enough training, print(best) with prediction=tf.argmax(logits,1) on input [1.0, 1.0 , 0.0, 0.0] should give you [array([2])], 2 being the index of the class for this input in your training set.