Tensorflow Life cycle of a node;

Tensorflow Life cycle of a node; - tensorflow

when two variables are being evaluated TensorFlow,it is not supposed to reuse the values from the previous evaluation.(This is what was mentioned in Hands on Machine learning with sklearn and tensorflow by Aurélien Géron)
w = tf.constant(5)
x = w + 5
y = x**2 + 5
z = x**2 + 5
Take the piece of code mentioned above for example.
y and z should have the same value, if x is not modified in between their evaluation.
But I tried modifying their vales in-between the evaluation and still they have the same results.
with tf.compat.v1.Session() as sess:
a = y.eval()
x = w + 3
b = z.eval()
I am sorry if this is a really dumb question, but I just wanted to get my basics clear, it would really helpful if someone took time to explain this. Thanks
print(a)
# 366
print(b)
# 366

To clarify your doubts, in your code,
w = tf.constant(5)
x = w + 5
y = x**2 + 5
z = x**2 + 5
just creates a computation graph until you create a session, even the variables are not initialized yet.
In the following code,
with tf.compat.v1.Session() as sess:
a = y.eval()
x = w + 3
b = z.eval()
Once the session opened, It takes care of placing all the operations into the device you are computing with ( CPU or GPU). And it holds all the variables inside it.
Inside the Session, once eval is called Tensorflow automatically determines the nodes that it depends, and evaluates these nodes first, to explain you with both the scenario you have taken.
1. a = y.eval()
Here y is dependent on x and x is dependent on w, so it evaluates w first and evaluates x to calculate y.
b = z.eval()
Here z is dependent on x and x is dependent on w, once again from start it evaluates w first and evaluates x to calculate z. It does not reuse the result of x and w from eval done on y.
In both the eval the nodes are selected by Tensorflow for value x is the first instance.
Below is the Graph representation of computation. Where you can see the second x declaration is not part of the graph(right corner).
if you want to run the computation in the order you can use sess.run() instead of eval().
When sess.run() called, this method completes one set of computations in our graph in the following manner: it starts at the requested output(s) and then works backward, computing nodes that must be executed according to the set of dependencies.
Below is the modified example with the desired result.
import tensorflow as tf
w = tf.constant(5)
x = w + 5
y = x**2 + 5
x = w + 3
z = x**2 + 5
with tf.compat.v1.Session() as sess:
# a = y.eval()
# init.run()
# b = z.eval()
print(sess.run(z))
print(sess.run(y))
Output:
69
105
You can notice when sess.run(z) is requested, while going backward immediate x value is x = w + 3 , similarly for sess.run(y), while doing backward computation immediate x value is x = w + 5.

Related

Create B such that B # X.ravel() == (A.T # X + X # A).ravel()

I'm not new to numpy, but the solution to this problem has been eluding me for a while now, so I thought that I would ask here.
Given the equation Y = A.T # X + X # A
where A and X are square matrices.
construct B such that B # X.ravel() == Y.ravel()
The problem is just one small part of a bigger code that I am trying to implement so I wrote a minimal reproducible example given below.
The challenge that I pose to you is to fill out the function "create_B" in a way such that the assertion does not raise an error.
The list of things that I have attempted myself is far too long to include in this question, and besides I don't feel that trying to include a list of my own failed attempts adds any value to the question.
import numpy as np
def create_B(A):
B = np.zeros((A.size, A.size))
return B
A = np.random.uniform(-1, 1, (100, 100))
X = np.random.uniform(-1, 1, (100, 100))
Y = A.T # X + X # A
B = create_B(A)
assert np.allclose(B # X.ravel(), Y.ravel())
I should probably mention that I can easily get this working using loops, but in reality A is very big and I need to do this many times so I need an optimized solution (ie. no python loops).
Basically I am looking for a solution of the form:
def create_B(A):
B = np.zeros((A.size, A.size))
B[indices1] = A
B[indices2] += A
return B
Where the challenge then is to create the index tuples indices1 and indices2.
An example of a solution using loops (not what i'm looking for):
def create_B(A):
B = np.zeros((A.size, A.size))
indices = np.arange(A.size).reshape(A.shape)
for i, k in enumerate(indices):
for j, k in enumerate(k):
B[k, indices[:, j]] = A[:, i]
B[k, indices[i, :]] += A[:, j]
return B
You can confirm for yourself that this is indeed a solution.

Then your equation is known as continuous and can be solved with scipy.linalg.solve_continous_lyapunov
Try this
Y = A.T # X + X # A
X_hat = solve_continuous_lyapunov(A.T, Y)
assert np.allclose(A.T # X_hat + X_hat # A, Y)
assert np.allclose(X_hat, X)
If you need B to be computed explicitly you will inevitably end up with a n2 x n2 matrix.
I fear you will not get nothing much better than you have (unless you build it as a sparse matrix)

Vectorizing ARD (Automatic Relevance Determination) kernel implementation in Gaussian processes

I am trying to implement an ARD kernel with NumPy as given in the GPML book (M3 from Equation 5.2).
I am struggling in vectorizing this equation for NxM kernel computation. I have tried the following non-vectorized version. Can someone help in vectorizing this in NumPy/PyTorch?
import numpy as np
N = 30 # Number of data points in X1
M = 40 # Number of data points in X2
D = 6 # Number of features (ARD dimensions)
X1 = np.random.rand(N, D)
X2 = np.random.rand(M, D)
Lambda = np.random.rand(D, 1)
L_inv = np.diag(np.random.rand(D))
sigma_f = np.random.rand()
K = np.empty((N, M))
for n in range(N):
for m in range(M):
M3 = Lambda#Lambda.T + L_inv**2
d = (X1[n,:] - X2[m,:]).reshape(-1,1)
K[n, m] = sigma_f**2 * np.exp(-0.5 * d.T#M3#d)

We can use the rules of broadcasting and the neat NumPy function einsum to vectorize array operations. In few words, broadcasting allows us to operate with arrays in one-liners by adding new dimensions to the resulting array, while einsum allows us to perform operations with multiple arrays by explicitly working in the index notation (instead of matrices).
Luckily, no loops are necessary to calculate your kernel. Please see below the vectorized solution, ARD_kernel function, which is about 30x faster in my machine than the original loopy version. Now, einsum is usually as fast as it gets, but it's possible that there are faster methods though, I've not checked anything else (e.g. usual # operator instead of einsum).
Also, there is a missing term in the code (the Kronecker delta), I don't know if it was omitted in purpose (let me know if you have problems implementing it and I'll edit the answer).
import numpy as np
N = 300 # Number of data points in X1
M = 400 # Number of data points in X2
D = 6 # Number of features (ARD dimensions)
np.random.seed(1) # Fix random seed for reproducibility
X1 = np.random.rand(N, D)
X2 = np.random.rand(M, D)
Lambda = np.random.rand(D, 1)
L_inv = np.diag(np.random.rand(D))
sigma_f = np.random.rand()
# Loopy function
def ARD_kernel_loops(X1, X2, Lambda, L_inv, sigma_f):
K = np.empty((N, M))
M3 = Lambda#Lambda.T + L_inv**2
for n in range(N):
for m in range(M):
d = (X1[n,:] - X2[m,:]).reshape(-1,1)
K[n, m] = np.exp(-0.5 * d.T#M3#d)
return K * sigma_f**2
# Vectorized function
def ARD_kernel(X1, X2, Lambda, L_inv, sigma_f):
M3 = Lambda.squeeze()*Lambda + L_inv**2 # Use broadcasting to avoid transpose
d = X1[:,None] - X2[None,...] # Use broadcasting to avoid loops
# order=F for memory layout (as your arrays are (N,M,D) instead of (D,N,M))
return sigma_f**2 * np.exp(-0.5 * np.einsum("ijk,kl,ijl->ij", d, M3, d, order = 'F'))

There is perhaps an additional optimisation. The examples of the M matrices given are all positive definite. This means that the Cholesky decomposition can be applied, wo that we can find upper triangular U so that
M = U'*U
The point of this is that if we apply U to the xs, so
y[p] = U*x[p] p=1..
Then
(x[p]-x[q])'*M*(x[p]-x[q]) = (y[p]-y[q])'*(y[p]-y[q])
Thus if there are N vectors x each of dimension d,
we convert the N squared O(d squared) operations on the LHS to N squared O(d) operations on the RHS
This has cost an extra choleski decompositon (O(d cubed))
and N O( d squared) applications of U to the xs.

How to explain the result of this tensorflow program?

import tensorflow as tf
x = tf.Variable(1)
x = x + 1
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
print(sess.run(x))
print(sess.run(x))
The output is
2
2
But I think the output should be
2
3
The first run, x has been update to 2, and the second run, x should be 3.
Who can tell me why the second run of x is 2 either? If x can't be update by the first run, how is the parameter of neural network update?
Update
x = tf.Variable(1)
x.assign(x+1)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
#print(x)
print(sess.run(x))
print(sess.run(x))
The output is
1
1
It is amazing.

Here's an analysis of your and Ishant Mrinal examples, it should help you understand what's going on here.
Example 1
x = tf.Variable(1)
Creation of a python variable x. Creation of a Tensorflow node variable_1. The python variable x holds a logical pointer to the node variable_1.
x = x + 1
Python assignment, destructive operation.
x now holds a pointer to the operation sum(variable_1, constant(1)).
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
usual code of initialization of variables.
print(sess.run(x))
Execution of x = execution of sum(variable_1, constant(1)) = 2
print(sess.run(x))
Execution of x = execution of sum(variable_1, constant(1)) = 2
Example 2
x = tf.Variable(1)
Creation of a python variable x. Creation of a Tensorflow node variable_1. The python variable x holds a logical pointer to the node variable_1.
init = tf.global_variables_initializer()
initialization of variable_1.
with tf.Session() as sess:
init.run()
execution of the initialization.
# here x point to the variable
print(sess.run(x))
evaluation of x = 1.
x = x + 1
Definition of a new node, exactly as in the previous example.
print(sess.run(x))
evaluation of the sum operation, thus 2.
Example 3
x = tf.Variable(1)
usual creation.
as_op = x.assign(x+1)
definition of a sum node followed by the definition of an assignment node, held by the python variable as_op.
This operation forces the order of execution between these 2 nodes. First executes the sum node, then use the result to assign it to the node variable variable_1.
with tf.Session() as sess:
init.run()
usual init ops
# here x point to the variable
print(sess.run(x))
evaluation of variable_1, thus 1.
sess.run(as_op)
execution of sum and assigment, thus temp = variable_1 + 1; variable_1 = temp;
print(sess.run(x))
extraction of the value pointed to x, thus 2.

tensorflow add op returns a updated tensor
x = x + 1
with tf.Session() as sess:
init.run()
print(x)
# here x is a addition op; hence the result same for both run
> result Tensor("add:0", shape=(), dtype=int32)
for both run x is just the same add op
print(sess.run(x)) # result: 2
print(sess.run(x)) # result: 2
if you change the location of the addition op then values will be different; since you will accessing initial value and updated value from the add op
x = tf.Variable(1)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
# here x point to the variable
print(sess.run(x)) # result: 1
x = x + 1
# here x point to the add op
print(sess.run(x)) # result: 2
Using assign op to get the updated value
x = tf.Variable(1)
as_op = x.assign(x+1)
with tf.Session() as sess:
init.run()
# here x point to the variable
print(sess.run(x)) # result: 1
sess.run(as_op) # variable updated
print(sess.run(x)) # result: 2

Does TensorFlow recompute nodes even if there are exactly the same computations already defined elsewhere in the graph?

Consider for example this example:
train_op = opt.minimize(loss)
gradients = tf.gradients(loss, tf.trainable_variables())
Are the gradients computed twice or just once?
Or this example:
a = y + z
b = y + z
Is the addition y + z computed twice or just once?

It is computed only once. See this post for more info about this and other optimizations tensorflow does at runtime.

Getting Started with TensorFlow documentation

I am not sure if this is the right place to raise this. I was following https://www.tensorflow.org/get_started/get_started and came across the following sample code:
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W * x + b
In the section on loss function it has the following:
y = tf.placeholder(tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
Why is value of y [0,-1,-2,-3]? Based on
linear_model = W * x + b,
y would be 0.3x - 0.3. So for x of [1,2,3,4], y should be [0,0.3,0.6,0.9].
Or am I missing something?

Yes, you are missing something. The goal of this exercise is to show that you first build up a graph (W*x+b=y), and then supply the variables for the placeholders.
To do this you supply x and y, and see what the difference is between wat you expected (the y variable) and what you got (the linear_model variable).
You are confusing the result of the equation with what the tutorial wanted to get out of this equation. If you go on in the tutorial they will probably learn you how you can train your weights to get the solution you expect.
Good luck!

In this code snippet, 'x' is the input and 'y' serves the purpose of label as you seem to already understand.
'W' and 'b' are variables that the program should 'learn' such that when x=[1,2,3,4], y ends up as [0,-1,-2,-3].
The values of 'W' and 'b' you see are the initial values.
Not included in this code is the update step where you will update the weights after computing the gradient based on a loss function. And after a few iterations, you should get your 'W' and 'b' such that when x=[1,2,3,4], you will get y=[0,-1,-2,-3]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Tensorflow Life cycle of a node; - tensorflow

Related

Create B such that B # X.ravel() == (A.T # X + X # A).ravel()

Vectorizing ARD (Automatic Relevance Determination) kernel implementation in Gaussian processes

How to explain the result of this tensorflow program?

Does TensorFlow recompute nodes even if there are exactly the same computations already defined elsewhere in the graph?

Getting Started with TensorFlow documentation

Categories

Resources