About the input of quantum neural network in tensorflow quantum - tensorflow2.0

I created a quantum neural network using tensorflow quantum,It's input is a tensor converted by circuit.About this input circuit,I found that if the parameters of the circuit are also specified by tensors, the quantum neural network cannot be trained.
The circuit when using normal parameters can make the network train normally
theta_g=1
blob_size = abs(1 - 4) / 5
spread_x = np.random.uniform(-blob_size, blob_size)
spread_y = np.random.uniform(-blob_size, blob_size)
angle = theta_g + spread_y
cir=cirq.Circuit(cirq.ry(-angle)(qubit), cirq.rx(-spread_x)(qubit))
discriminator_network(tfq.convert_to_tensor([cir]))
But when I use the following code, the quantum neural network cannot be trained
theta_g=tf.constant([1])
blob_size = abs(1 - 4) / 5
spread_x = np.random.uniform(-blob_size, blob_size)
spread_y = np.random.uniform(-blob_size, blob_size)
spred_x = tf.constant(spread_x)
spred_y = tf.constant(spread_y)
angle = theta_g + spread_y
cir=cirq.Circuit(cirq.ry(-angle)(qubit), cirq.rx(-spread_x)(qubit))
discriminator_network(tfq.convert_to_tensor([cir]))
** the disciminator_network**
def discriminator():
theta = sympy.Symbol('theta')
q_model = cirq.Circuit(cirq.ry(theta)(qubit))
q_data_input = tf.keras.Input(
shape=(), dtype=tf.dtypes.string)
expectation = tfq.layers.PQC(q_model, cirq.Z(qubit))
expectation_output = expectation(q_data_input)
classifier = tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid)
classifier_output = classifier(expectation_output)
model = tf.keras.Model(inputs=q_data_input, outputs=classifier_output)
return model

Without being able to see the trace of the error you are getting, I would say that I think the problem you are running into in the second snippet is that you have placed tf.constant objects into the placeholders of the cirq.Circuit. The reason your first example works is because cirq.Circuits know how to interpret values from np.float32 datatypes. Cirq does not know how to interpret values from tf.float32 (or any tf.dtypes.* for that matter).
TensorFlow Quantum's entry point to interface tensorflow datatypes with cirq.Circuit objects is via resolving the sympy.Symbol values inside of the circuits in tfq native operations (which you have done in creating the tfq.layers.PQC).
Does this help clear things up ?
-Michael

Related

Extracting mean and std from MixtureNormal model in Tensorflow Probability

I'm currently using tensorflow probability to build an MDN to perform a regression problem. Everything works great, however, I would like to explore some properties of the model. Because I'm using a model with a mixture of gaussians, I should be able to see the mean and std of each gaussian component. Indeed, I can extract the weights from the model. It seems like there are three numbers from each gaussian component. I'm wondering which (if any) are the mean and std from the mixture of gaussians.
The model I am using is built as follows:
def keras_model_2gauss_mdn(n_variables, name='gauss2_mdn'):
event_shape = [1]
num_components = 2
param_size = tfp.layers.MixtureNormal.params_size(num_components, event_shape)
x_1 = tf.keras.Input(shape=n_variables)
hidden_0 = tf.keras.layers.Dense(192, activation='relu')(x_1)
hidden_1 = tf.keras.layers.Dense(192, activation='relu')(hidden_0)
hidden_2 = tf.keras.layers.Dense(192, activation='relu')(hidden_1)
hidden_3 = tf.keras.layers.Dense(128, activation='relu')(hidden_2)
hidden_4 = tf.keras.layers.Dense(64, activation='relu')(hidden_3)
hidden_5 = tf.keras.layers.Dense(param_size, activation=None)(hidden_4)
output = tfp.layers.MixtureNormal(num_components, event_shape)(hidden_5)
return tf.keras.Model(inputs=x_1, outputs=output, name=name)
After compiling and fitting (i.e. after training), I can get the weights from the whole model by calling .get_weights. By selecting the last vector from this output, I can get the weights of the MixtureNormal layer. This looks something like
array([ 0.09415845, -0.0941584 , -0.02495631, -0.05152947, -0.04510244,
-0.00484127], dtype=float32)
I suspect the first number in each group of three is the weight, the second is the mean, and the third is the std, but need some clarity on if this is actually the case.
Notice that I've also tried the solution given here and it doesn't seem to work for tfp.layers.MixtureNormal.
I'm rather new to ML and tensorflow, so any help is greatly appreciated!
The idea here is when you pass an input to your network, you get a distribution back. In order to make things work nicely with Keras and other things you might do with the output of a NN, the resulting distribution is wrapped in something called _TensorCoercible. This means that when you pass the distribution into a TF op, the distribution will turn itself into a tensor. The default way of doing this is to sample the distribution, but it's configurable via the convert_to_tensor_fn argument that all TFP layers accept. Eg, you could use convert_to_tensor_fn=lambda dist: dist.mean() (or whatever you like!). Anyway, this means that when you invoke your model on some input, you don't directly get the MixtureSameFamily (Distribution!) instance underlying the MixtureNormal (TFP layer!) output -- you get a _TensorCoercible wrapper around it.
To get the MixtureSameFamily instance, look at the tensor_distribution member on the resultant TC object. It appears that, within the MSF instance, the mixture distribution is not a TC, but the components distribution is. Not sure why. Here's a runnable snippet adapted from your code:
import tensorflow as tf
import tensorflow_probability as tfp
n_variables=[1]
name='blah'
event_shape = [1]
num_components = 2
param_size = tfp.layers.MixtureNormal.params_size(num_components, event_shape)
x_1 = tf.keras.Input(shape=n_variables)
hidden_0 = tf.keras.layers.Dense(192, activation='relu')(x_1)
hidden_1 = tf.keras.layers.Dense(192, activation='relu')(hidden_0)
hidden_2 = tf.keras.layers.Dense(192, activation='relu')(hidden_1)
hidden_3 = tf.keras.layers.Dense(128, activation='relu')(hidden_2)
hidden_4 = tf.keras.layers.Dense(64, activation='relu')(hidden_3)
hidden_5 = tf.keras.layers.Dense(param_size, activation=None)(hidden_4)
output = tfp.layers.MixtureNormal(num_components, event_shape)(hidden_5)
model = tf.keras.Model(inputs=x_1, outputs=output, name=name)
model.compile()
dist = model(tf.constant([[1.]]))
print('mixture component logits: ',
dist.tensor_distribution.mixture_distribution.logits.numpy())
print('mixutre component means: ',
dist.tensor_distribution.components_distribution.tensor_distribution.mean().numpy())
print('mixture component stddevs: ',
dist.tensor_distribution.components_distribution.tensor_distribution.stddev().numpy())
Output:
mixture component logits: [[0.01587015 0.03365375]]
mixutre component means: [[[ 0.04741365]
[-0.01594907]]]
mixture component stddevs: [[[0.68762577]
[0.687484 ]]]
HTH!

OpenVino converted model not returning same score values as original model (Sigmoid)

I've converted a Keras model for use with OpenVino. The original Keras model used sigmoid to return scores ranging from 0 to 1 for binary classification. After converting the model for use with OpenVino, the scores are all near 0.99 for both classes but seem slightly lower for one of the classes.
For example, test1.jpg and test2.jpg (from opposite classes) yield scores of 0.00320357 and 0.9999, respectively.
With OpenVino, the same images yield scores of 0.9998982 and 0.9962392, respectively.
Edit* One suspicion is that the input array is still accepted by the OpenVino model but is somehow changed in shape or "scrambled" and therefore is never a match for class one? In other words, if you fed it random noise, the score would also always be 0.9999. Maybe I'd have to somehow get the OpenVino model to accept the original shape (1,180,180,3) instead of (1,3,180,180) so I don't have to force the input into a different shape than the one the original model accepted? That's weird though because I specified the shape when making the xml and bin for openvino:
python3 /opt/intel/openvino_2021/deployment_tools/model_optimizer/mo_tf.py --saved_model_dir /Users/.../Desktop/.../model13 --output_dir /Users/.../Desktop/... --input_shape=\[1,180,180,3]
However, I know from error messages that the inference engine is expecting (1,3,180,180) for some unknown reason. Could that be the problem? The other suspicion is something wrong with how the original model was frozen. I'm exploring different ways to freeze the original model (keras model converted to pb) in case the problem is related to that.
I checked to make sure the Sigmoid activation function is being used in the OpenVino implementation (same activation as the Keras model) and it looks like it is. Why, then, are the values not the same? Any help would be much appreciated.
The code for the OpenVino inference is:
import openvino
from openvino.inference_engine import IECore, IENetwork
from skimage import io
import sys
import numpy as np
import os
def loadNetwork(model_xml, model_bin):
ie = IECore()
network = ie.read_network(model=model_xml, weights=model_bin)
input_placeholder_key = list(network.input_info)[0]
input_placeholder = network.input_info[input_placeholder_key]
output_placeholder_key = list(network.outputs)[0]
output_placeholder = network.outputs[output_placeholder_key]
return network, input_placeholder_key, output_placeholder_key
batch_size = 1
channels = 3
IMG_HEIGHT = 180
IMG_WIDTH = 180
#loadNetwork('saved_model.xml','saved_model.bin')
image_path = 'test.jpg'
def load_source(path_to_image):
image = io.imread(path_to_image)
img = np.resize(image,(180,180))
return img
img_new = load_source('test2.jpg')
#Batch?
def classify(image):
device = 'CPU'
network, input_placeholder_key, output_placeholder_key = loadNetwork('saved_model.xml','saved_model.bin')
ie = IECore()
exec_net = ie.load_network(network=network, device_name=device)
res = exec_net.infer(inputs={input_placeholder_key: image})
print(res)
res = res[output_placeholder_key]
return res
result = classify(img_new)
print(result)
result = result[0]
top_result = np.argmax(result)
print(top_result)
print(result[top_result])
And the result:
{'StatefulPartitionedCall/model/dense/Sigmoid': array([[0.9962392]], dtype=float32)}
[[0.9962392]]
0
0.9962392
Generally, Tensorflow is the only network with the shape NHWC while most others use NCHW. Thus, the OpenVINO Inference Engine satisfies the majority of networks and uses the NCHW layout. Model must be converted to NCHW layout in order to work with Inference Engine.
The conversion of the native model format into IR involves the process where the Model Optimizer performs the necessary transformation to convert the shape to the layout required by the Inference Engine (N,C,H,W). Using the --input_shape parameter with the correct input shape of the model should suffice.
Besides, most TensorFlow models are trained with images in RGB order. In this case, inference results using the Inference Engine samples may be incorrect. By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with --reverse_input_channels argument.
I suggest you validate this by inferring your model with the Hello Classification Python Sample instead since this is one of the official samples provided to test the model's functionality.
You may refer to this "Intel Math Kernel Library for Deep Neural Network" for deeper explanation regarding the input shape.

Easy way to clamp Neural Network outputs between 0 and 1?

So I'm working on writing a GAN neural network and I want to set my network's output to 0 if it is less than 0 and 1 if it is greater than 1 and leave it unchanged otherwise. I'm pretty new to tensorflow, but I don't know of any tensorflow function or activation to do this without unwanted side effects. So I made my loss function so it calculates the loss as if the output was clamped, with this code:
def discriminator_loss(real_output, fake_output):
real_output_clipped = min(max(real_output.numpy()[0],
0), 1)
fake_output_clipped = min(max(fake_output.numpy()[0],
0), 1)
real_clipped_tensor =
tf.Variable([[real_output_clipped]], dtype = "float32")
fake_clipped_tensor =
tf.Variable([[fake_output_clipped]], dtype = "float32")
real_loss = cross_entropy(tf.ones_like(real_output),
real_clipped_tensor)
fake_loss = cross_entropy(tf.zeros_like(fake_output),
fake_clipped_tensor)
total_loss = real_loss + fake_loss
return total_loss
but I get this error:
ValueError: No gradients provided for any variable: ['dense_50/kernel:0', 'dense_50/bias:0', 'dense_51/kernel:0', 'dense_51/bias:0', 'dense_52/kernel:0', 'dense_52/bias:0', 'dense_53/kernel:0', 'dense_53/bias:0'].
Does anyone know a better way to do this, or a way to fix this error?
Thanks!
You can apply a ReLU layer from Keras as your final layer and set max_value=1.0. For example:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(32, input_shape=(16,)))
model.add(tf.keras.layers.Dense(32))
model.add(tf.keras.layers.ReLU(max_value=1.0))
You can read more about it here: https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU
TF probably does not know how to update your network weights based on this loss. The input of the cross entropy are tensors (variables) that are directly assigned from numpy arrays and are not connected to your actual network outputs.
If you want to perform operations on tensors that will remain within the graph and (hopefully) be differentiable, use the available TF operations. There's a "clip_by_value" operation described here: https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/clip_by_value.
E.g. real_output_clipped = tf.clip_by_value(real_output, clip_value_min=0, clip_value_max=1)

Neural network only converges when data cloud is close to 0

I am new to tensorflow and am learning the basics at the moment so please bear with me.
My problem concerns strange non-convergent behaviour of neural networks when presented with the supposedly simple task of finding a regression function for a small training set consisting only of m = 100 data points {(x_1, y_1), (x_2, y_2),...,(x_100, y_100)}, where x_i and y_i are real numbers.
I first constructed a function that automatically generates a computational graph corresponding to a classical fully connected feedforward neural network:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import math
def neural_network_constructor(arch_list = [1,3,3,1],
act_func = tf.nn.sigmoid,
w_initializer = tf.contrib.layers.xavier_initializer(),
b_initializer = tf.zeros_initializer(),
loss_function = tf.losses.mean_squared_error,
training_method = tf.train.GradientDescentOptimizer(0.5)):
n_input = arch_list[0]
n_output = arch_list[-1]
X = tf.placeholder(dtype = tf.float32, shape = [None, n_input])
layer = tf.contrib.layers.fully_connected(
inputs = X,
num_outputs = arch_list[1],
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)
for N in arch_list[2:-1]:
layer = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = N,
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)
Phi = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = n_output,
activation_fn = tf.identity,
weights_initializer = w_initializer,
biases_initializer = b_initializer)
Y = tf.placeholder(tf.float32, [None, n_output])
loss = loss_function(Y, Phi)
train_step = training_method.minimize(loss)
return [X, Phi, Y, train_step]
With the above default values for the arguments, this function would construct a computational graph corresponding to a neural network with 1 input neuron, 2 hidden layers with 3 neurons each and 1 output neuron. The activation function is per default the sigmoid function. X corresponds to the input tensor, Y to the labels of the training data and Phi to the feedforward output of the neural network. The operation train_step performs one gradient-descent step when executed in the session environment.
So far, so good. If I now test a particular neural network (constructed with this function and the exact default values for the arguments given above) by making it learn a simple regression function for artificial data extracted from a sinewave, strange things happen:
Before training, the network seems to be a flat line. After 100.000 training iterations, it manages to partially learn the function, but only the part which is closer to 0. After this, it becomes flat again. Further training does not decrease the loss function anymore.
This get even stranger, when I take the exact same data set, but shift all x-values by adding 500:
Here, the network completely refuses to learn. I cannot understand why this is happening. I have tried changing the architecture of the network and its learning rate, but have observed similar effects: the closer the x-values of the data cloud are to the origin, the easier the network can learn. After a certain distance to the origin, learning stops completely. Changing the activation function from sigmoid to ReLu has only made things worse; here, the network tends to just converge to the average, no matter what position the data cloud is in.
Is there something wrong with my implementation of the neural-network-constructor? Or does this have something do do with initialization values? I have tried to get a deeper understanding of this problem now for quite a while and would greatly appreciate some advice. What could be the cause of this? All thoughts on why this behaviour is occuring are very much welcome!
Thanks,
Joker

How to initialize a keras tensor employed in an API model

I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.