about tensorflow backpropagation - tensorflow

If I have two neural networks A and B, and I use the output of network A to feed the input(placeholder) of network B. And I use the optimizer to minimize the loss of network B, can the network A's parameters be updated by backpropagation?

Yes, if "feed" is done in TensorFlow; no, if you do it manually.
Specifically, if you evaluate A, then train B with those outputs manually fed in (say, as a feed dict), A will not change, because it is not involved in the training stage.
If you set the input of the B network to be the output of an op in A (instead of a tf.Placeholder, for instance), then you can train the combined network, which will update A's parameters. In this case, though, you're really just training a combined network "AB", not two separate networks.
A concrete example:
import numpy as np
import tensorflow as tf
# A network
A_input = tf.placeholder(tf.float32, [None,100])
A_weights = tf.Variable(tf.random_normal([100,10]))
A_output = tf.matmul(A_input,A_weights)
# B network
B_input = tf.placeholder(tf.float32, [None,10])
B_weights = tf.Variable(tf.random_normal([10,5]))
B_output = tf.matmul(B_input,B_weights)
# AB network
AB_input = A_output
AB_weights = tf.Variable(tf.random_normal([10,5]))
AB_output = tf.matmul(AB_input,AB_weights)
test_inputs = np.random.rand(17,100)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
A_out = sess.run(A_output, feed_dict={A_input: test_inputs})
print 'A output shape:',A_out.shape
B_out = sess.run(B_output, feed_dict={B_input: A_out})
print 'B output shape:',B_out.shape
AB_out = sess.run(AB_output, feed_dict={A_input: test_inputs})
print 'AB output shape:',AB_out.shape
In the first case, we've fed network B with the outputs from network A by using a feed_dict. This is evaluating network A in tensorflow, pulling the results back into python, then evaluating network B in tensorflow. If you try to train network B in this fashion, you'll only update parameters in network B.
In the second case, we've fed the "B" part of network AB by directly connecting the outputs of network A to the input of network AB. Evaluating network AB never pulls the intermediate results of network A back into python, so if you train network AB in this fashion, you can update the parameters for the combined network. (note: your training inputs are feed to the A_input of network AB, not the intermediate tensor AB_input)

Related

About the input of quantum neural network in tensorflow quantum

I created a quantum neural network using tensorflow quantum,It's input is a tensor converted by circuit.About this input circuit,I found that if the parameters of the circuit are also specified by tensors, the quantum neural network cannot be trained.
The circuit when using normal parameters can make the network train normally
theta_g=1
blob_size = abs(1 - 4) / 5
spread_x = np.random.uniform(-blob_size, blob_size)
spread_y = np.random.uniform(-blob_size, blob_size)
angle = theta_g + spread_y
cir=cirq.Circuit(cirq.ry(-angle)(qubit), cirq.rx(-spread_x)(qubit))
discriminator_network(tfq.convert_to_tensor([cir]))
But when I use the following code, the quantum neural network cannot be trained
theta_g=tf.constant([1])
blob_size = abs(1 - 4) / 5
spread_x = np.random.uniform(-blob_size, blob_size)
spread_y = np.random.uniform(-blob_size, blob_size)
spred_x = tf.constant(spread_x)
spred_y = tf.constant(spread_y)
angle = theta_g + spread_y
cir=cirq.Circuit(cirq.ry(-angle)(qubit), cirq.rx(-spread_x)(qubit))
discriminator_network(tfq.convert_to_tensor([cir]))
** the disciminator_network**
def discriminator():
theta = sympy.Symbol('theta')
q_model = cirq.Circuit(cirq.ry(theta)(qubit))
q_data_input = tf.keras.Input(
shape=(), dtype=tf.dtypes.string)
expectation = tfq.layers.PQC(q_model, cirq.Z(qubit))
expectation_output = expectation(q_data_input)
classifier = tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid)
classifier_output = classifier(expectation_output)
model = tf.keras.Model(inputs=q_data_input, outputs=classifier_output)
return model
Without being able to see the trace of the error you are getting, I would say that I think the problem you are running into in the second snippet is that you have placed tf.constant objects into the placeholders of the cirq.Circuit. The reason your first example works is because cirq.Circuits know how to interpret values from np.float32 datatypes. Cirq does not know how to interpret values from tf.float32 (or any tf.dtypes.* for that matter).
TensorFlow Quantum's entry point to interface tensorflow datatypes with cirq.Circuit objects is via resolving the sympy.Symbol values inside of the circuits in tfq native operations (which you have done in creating the tfq.layers.PQC).
Does this help clear things up ?
-Michael

Neural network only converges when data cloud is close to 0

I am new to tensorflow and am learning the basics at the moment so please bear with me.
My problem concerns strange non-convergent behaviour of neural networks when presented with the supposedly simple task of finding a regression function for a small training set consisting only of m = 100 data points {(x_1, y_1), (x_2, y_2),...,(x_100, y_100)}, where x_i and y_i are real numbers.
I first constructed a function that automatically generates a computational graph corresponding to a classical fully connected feedforward neural network:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import math
def neural_network_constructor(arch_list = [1,3,3,1],
act_func = tf.nn.sigmoid,
w_initializer = tf.contrib.layers.xavier_initializer(),
b_initializer = tf.zeros_initializer(),
loss_function = tf.losses.mean_squared_error,
training_method = tf.train.GradientDescentOptimizer(0.5)):
n_input = arch_list[0]
n_output = arch_list[-1]
X = tf.placeholder(dtype = tf.float32, shape = [None, n_input])
layer = tf.contrib.layers.fully_connected(
inputs = X,
num_outputs = arch_list[1],
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)
for N in arch_list[2:-1]:
layer = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = N,
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)
Phi = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = n_output,
activation_fn = tf.identity,
weights_initializer = w_initializer,
biases_initializer = b_initializer)
Y = tf.placeholder(tf.float32, [None, n_output])
loss = loss_function(Y, Phi)
train_step = training_method.minimize(loss)
return [X, Phi, Y, train_step]
With the above default values for the arguments, this function would construct a computational graph corresponding to a neural network with 1 input neuron, 2 hidden layers with 3 neurons each and 1 output neuron. The activation function is per default the sigmoid function. X corresponds to the input tensor, Y to the labels of the training data and Phi to the feedforward output of the neural network. The operation train_step performs one gradient-descent step when executed in the session environment.
So far, so good. If I now test a particular neural network (constructed with this function and the exact default values for the arguments given above) by making it learn a simple regression function for artificial data extracted from a sinewave, strange things happen:
Before training, the network seems to be a flat line. After 100.000 training iterations, it manages to partially learn the function, but only the part which is closer to 0. After this, it becomes flat again. Further training does not decrease the loss function anymore.
This get even stranger, when I take the exact same data set, but shift all x-values by adding 500:
Here, the network completely refuses to learn. I cannot understand why this is happening. I have tried changing the architecture of the network and its learning rate, but have observed similar effects: the closer the x-values of the data cloud are to the origin, the easier the network can learn. After a certain distance to the origin, learning stops completely. Changing the activation function from sigmoid to ReLu has only made things worse; here, the network tends to just converge to the average, no matter what position the data cloud is in.
Is there something wrong with my implementation of the neural-network-constructor? Or does this have something do do with initialization values? I have tried to get a deeper understanding of this problem now for quite a while and would greatly appreciate some advice. What could be the cause of this? All thoughts on why this behaviour is occuring are very much welcome!
Thanks,
Joker

Deploy a trained tensorflow model in a file format into a Raspberry Pi

I recently built a tensorflow model and trained it; Here is the input.
data = tf.placeholder(tf.float32, [None,20,1]) #Number of examples, number of input, dimension of each input
After a while of computation through LSTM cells, and dense layers, the final prediction is made here:
logits = tf.layers.dense(inputs=den3, units=1)
After training it and saving it, like so:
for i in range(epoch):
for j in range(batchSize):
loss = sess.run([step,error],feed_dict={data:np.array([tx[j]]),target:np.array([ty[j]])})
saver.save(sess,'model')
This in turn saves 4 files which are
here
I then copy then over to the Pi, but i don't know how to just open them and pass some inputs and read the output, without building the whole graph and assigning each variable.
You should just be able to copy the files over to your Pi and then run saver.restore() to load them again. Here's an example of doing that:
from tensorflow.python.training import saver as saver_lib
saver = saver_lib.Saver()
saver.restore(sess, 'model')
This is based on https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py

Use inception v3 with batch of images in tensorflow

In one of my project of computer vision, I use public pre-trained inception-v3 available here: http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz. This network is at the beginning of my classification chain (a lot of stuff is performed on logits produced by the network). I would like to feed this network with a batch of images (instead of sequentially processing images) in order to make it faster.
However, the provided network had been "frozen", and it can only process one image at a time.
Is there any solution to "unfreeze" a graph and adapt it so that I can use it on batch of images?
(N.B : I found related topics on the internet, but they all suggest to take a more recent network available for instance here :
http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz. This is not what I would like to do since a lot of stuff has been tuned on the output of the frozen model.)
Not sure if this is too late, but here is the code snippet that I used:
# First loan model into and old graph
proto_file = ... # downloaded inception protofile
graph_def = tf.GraphDef.FromString(open(proto_file, 'rb').read())
to_delete = {“DecodeJpeg", "Cast", "ExpandDims", "pool_3/_reshape", "softmax"}
graph_def = delete_ops_from_graph(graph_def, to_delete)
new_graph = tf.Graph()
with new_graph.as_default():
x = tf.placeholder(tf.uint8, [None, None, None, 3], name="batched_inputs")
x_cast = tf.cast(x, dtype=tf.float32)
y = tf.import_graph_def(graph_def, input_map={"ExpandDims:0": x_cast}, return_elements=["pool_3:0"],name="")
...
Now new_graph is the graph that has batch dimension (takes in 4d tensor NHWC). Note that this is good if you want to use inception-2015-12-05.tgz as a feature extractor. You would need to take the output from output = new_graph.get_tensor_by_name("pool_3:0")
For the definition of delete_ops_from_graph, see Tensorflow: delete nodes from graph

Having trouble introducing a constant matrix to multiplication in Tensorflow

I am trying to implement a layer that is not fully connected. I have a matrix that specifies the connectivity I desire in the variable connectivity_matrix, which is a numpy array of ones and zeros.
The way I am currently trying to impliment the layer is by pairwise multiplying the weights, by this connectivity matrix F:
Is this the correct way to do this in tensorflow? Here is what I have so far
import numpy as np
import tensorflow as tf
import tflearn
num_input = 10
num_layer1 = 313
num_output = 700
# For example:
connectivity_matrix = np.array(np.random.choice([0, 1], size=(num_layer1, num_output)), dtype='float32')
input = tflearn.input_data(shape=[None, num_input])
# Here is where I specify the connectivity in tensorflow
connectivity = tf.constant(connectivity_matrix, shape=[num_layer1, num_output])
# One basic, fully connected layer
layer1 = tflearn.fully_connected(input, num_layer1, activation='relu')
# Here is where I want to have a non-fully connected layer
W = tf.Variable(tf.random_uniform([num_layer1, num_output]))
b = tf.Variable(tf.zeros([num_output]))
# so take a fully connected W, and do a pairwise multiplication with my tf_connectivity matrix
W_filtered = tf.mul(connectivity, W)
output = tf.matmul(layer1, W_filtered) + b
Masking out unwanted connections in each iteration should work, but I am not sure what the convergence properties are like. It may okay for a small enough learning rate?
Another approach would be to penalize unwanted weights in the cost function. You would use a mask matrix with 1's at unwanted connections, and 0's at wanted ones (or have a smoother transition). This would be multiplied by weights, squared/scaled and added to the cost function. This should converge more smoothly.
P.S.: If you've made progress on this, it would be great to hear your comments as I am also working on this problem.