I have cythonized the following file that uses numpy's matrix multiplication:
def cell(float[:, ::1] a, float[:, ::1] b):
c = a # b
return c
However, when I call it with:
from matmul import cell
import numpy as np
a = np.zeros((1, 64), dtype=np.float32)
b = np.zeros((64, 64), dtype=np.float32)
c = cell(a, b)
I get the following error:
TypeError: unsupported operand type(s) for #: _memoryviewslice and
_memoryviewslice
How can I perform matrix multiplication with Cython?
Context: the function "cell" is part of a code I wrote that performs a prediction by an LSTM network (I wrote it manually, without using PyTorch or Tensorflow, just NumPy). I need to speed up the code to be able to use the network in real-time.
If that's all you're doing there's literally no point in adding the types for the argument of cell - all you're doing is adding expensive type-checks for no reason. Cython can't make useful use of these types. Just leave a and b untyped.
If you do actually need to fix memoryviews operations with Numpy whole-array operations the easiest solution is to call np.asarray
def cell(float[:, ::1] a, float[:, ::1] b):
c = np.asarray(a) # np.asarray(b)
return c
You aren't getting any benefit from Cython here - it's just calling into the Numpy matrix multiply code. So only do this where you need to mix it with some operations where you do benefit from Cython.
Related
I had already implemented an optimization (gradient descent) algorithm by using the Tensorflow-probability built-in KL-Divergence as a loss function. Theoretically it worked well, but then I found out that the list of registered distributions, which you are able to compare in the KL-Divergence, is quite limited. I tried to minimize the KL-Divergence of a Gaussian Mixture Model (as the true distribution) and a Normal Distribution (optimize Mean and Std, such that KL-Divergence becomes minimal), which was not possible.
So I tried to implement my own approach, which did not work:
import numpy as np
from scipy.stats import norm
import tensorflow as tf
The idea I had was to create densities of the needed distributions via scipy.stats (lets say Normal distributions) and transform the density-variables to Tensors:
x = np.arange(-10,10,0.001)
mu_train = tf.Variable(2.0)
p_pdf = norm.pdf(x, 0, 1)
q_pdf = norm.pdf(x, mu_train,1)
p = tf.convert_to_tensor(p_pdf)
q = tf.convert_to_tensor(q_pdf, dtype=tf.float64)
Now I defined the KL-Divergence as a function that only depends on q.
def kl_loss(q):
return tf.reduce_sum(
tf.where(p == 0, tf.zeros(p.shape, tf.float64), p * tf.math.log(p / q))
)
Then I calculated the gradient of kl_loss with respect to mu_train, but the output I get from this a "None".
with tf.GradientTape() as tape:
tape.watch(mu_train)
loss = kl_loss(q)
d_loss_d_mu = tape.gradient(loss, mu_train)
print(d_loss_d_mu)
Now that I have thought about it.. to get a "None" as output makes sense to me, since kl_loss(q) is a function that does only depend on the values "q(x)", that are generated by the density q, but it does not depend on mu_train directly, since this is just a parameter of the Normal Distribution but the input for the kl_loss is an array/tensor of values of the normal distribution..
Does anyone know how I can find a workaround for this or does anyone else have a completely different solution to get the KL-Divergence as a loss function with arbitrary distributions, such that I can compute gradients with respect to parameters and run a GradientDescent Minimizer.
I am supplying different minibatches to optimize a GPflow model (SVGP). If I decorate the optimization_step with tf.function I get the following error:
NotImplementedError: Cannot convert a symbolic Tensor (concat:0) to a
numpy array. This error may indicate that you're trying to pass a
Tensor to a NumPy call, which is not supported
In order for the optimizer to run I had to remove the tf.function decorator, losing the speed-up advantages. What do I need to change so that I can keep using the tf.function decorator?
The xAndY input shapes and types are all numpy arrays.
type(xAndY)
Out[71]: tuple
xAndY[0].shape
Out[72]: (245760, 2)
xAndY[1].shape
Out[73]: (245760, 1)
type(xAndY[0])
Out[74]: numpy.ndarray
def run_optimizer_on_minibatch_size(model, iterations, minibatch_size, xAndY):
"""
Utility function running a Scipy optimizer
:param model: GPflow model
:param interations: number of iterations
"""
N = xAndY[0].shape[0]
tensor_data = tuple(map(tf.convert_to_tensor, xAndY))
train_dataset = tf.data.Dataset.from_tensor_slices(tensor_data).repeat().shuffle(N)
logf = []
train_iter = iter(train_dataset.batch(minibatch_size))
training_loss = model.training_loss_closure(train_iter, compile=True)
optimizer = gpflow.optimizers.Scipy()
#tf.function # had to remove this decorator
def optimization_step():
optimizer.minimize(training_loss, model.trainable_variables)
# step = 0
for step in range(iterations):
optimization_step()
if step % 10 == 0:
elbo = -training_loss().numpy()
logf.append(elbo)
print(elbo)
return logf
from gpflow.ci_utils import ci_niter
maxiter = ci_niter(20000)
logf = run_optimizer_on_minibatch_size(m, maxiter, minibatch_size, (X,Y))
GPflow's gpflow.optimizers.Scipy() is a wrapper around Scipy's minimize(), and as it calls into non-TensorFlow operations, you cannot wrap it in tf.function. Moreover, the optimizers implemented in Scipy's minimize are second-order methods that assume that your gradients are not stochastic, and aren't compatible with minibatching.
If you want to do full-batch optimization with Scipy: The minimize() method of gpflow.optimizers.Scipy(), by default, does wrap the objective and gradient computation inside tf.function (see its compile argument with default True). It also does the full optimization, so you only have to call the minimize() method once (by default it runs until convergence or failure to continue optimization; you can supply a maximum number of iterations using the options=dict(maxiter=1000) argument).
If you want to use mini-batching: simply use one of the TensorFlow optimizers, such as tf.optimizers.Adam(), and then your code should run fine including the #tf.function decorator on your optimization_step() function (and in that case you do need to call it in a loop as in your example).
I'm trying to calculate the gradients of the samples from a Bernoulli distribution w.r.t. the probabilities p (of a sample being 1).
I tried using both the implementation of the Bernoulli distribution provided in tensorflow.contrib.distributions and my own simple implementation based on this discussion. However both methods fail when I try to calculate the gradients.
Using the Bernoulli implementation:
import tensorflow as tf
from tensorflow.contrib.distributions import Bernoulli
p = tf.constant([0.2, 0.6])
b = Bernoulli(p=p)
s = b.sample()
g = tf.gradients(s, p)
with tf.Session() as session:
print(session.run(g))
The above code gives me the following error:
TypeError: Fetch argument None has invalid type <class 'NoneType'>
Using my implementation:
import tensorflow as tf
p = tf.constant([0.2, 0.6])
shape = [1, 2]
s = tf.select(tf.random_uniform(shape) - p > 0.0, tf.ones(shape), tf.zeros(shape))
g = tf.gradients(s, p)
with tf.Session() as session:
print(session.run(g))
Same error:
TypeError: Fetch argument None has invalid type <class 'NoneType'>
Is there a way to calculate the gradients of Bernoulli samples?
(My TensorFlow version is 0.12).
You cannot backprop through a discrete stochastic node for obvious reasons. As gradients are not defined.
However if you approximate the Bernoulli with a continuos distribution controlled by a temperature parameter, yes you can.
This idea is called reparametrization trick and is implemented in the RelaxedBernoulli in Tensorflow Probability (or also in TF.contrib library)
Relaxed bernoulli
You can specify the probability p of your Bernoulli, which is your random variable, et voilĂ .
Following code uses tensorflow library and it runs terribly slower compared to numpy library. I am aware that I am calling a function that uses tensorflow library inside python for loop (which I will parallelize with python multiprocessing later), but the code as is, runs extremely slow.
Could someone please help how I can make this code run faster? Thanks.
from math import *
import numpy as np
import sys
from multiprocessing import Pool
import tensorflow as tf
def Trajectory_Fun(tspan, a, b, session=None, server=None):
# Open tensorflow session
if session==None:
if server==None:
sess = tf.Session()
else:
sess = tf.Session(server.target)
else:
sess = session
B = np.zeros(np.size(tspan), dtype=np.float64)
B[0] = b
for i, t in enumerate(tspan):
r = np.random.rand(1)
if r>a:
c = sess.run(tf.trace(tf.random_normal((4, 4), r, 1.0)))
else:
c = 0.0 # sess.run(tf.trace(tf.random_normal((4, 4), 0.0, 1.0)))
B[i] = c
# Close tensorflow session
if session==None:
sess.close()
return B
def main(argv):
# Parameters
tspan = np.arange(0.0, 1000.0)
a = 0.1
b = 0.0
# Run test program
B = Trajectory_Fun(tspan, a, b, None, None)
print 'Done!'
if __name__ == "__main__":
main(sys.argv[1:])
As stated in your question, this program will give poor performance because it creates several new TensorFlow graph nodes per operation. The underlying assumption in TensorFlow is (approximately) that you'll build a graph once and then call sess.run() on (various parts of) it multiple times. The first time you run a graph is relatively expensive, because TensorFlow has to build various data structures and optimize the execution of the graph across multiple devices.
However, TensorFlow caches this work, so subsequent uses are much cheaper.
You can make this program much faster by constructing the graph once and using (e.g.) a tf.placeholder() op to feed in the value that changes in each iteration. For example, the following should do the trick:
B = np.zeros(np.size(tspan), dtype=np.float64)
B[0] = b
# Define the TensorFlow graph once and reuse it in each iteration of the for loop.
r_placeholder = tf.placeholder(tf.float32, shape=[])
out_t = tf.trace(tf.random_normal((4, 4), r_placeholder, 1.0))
with tf.Session() as sess:
for i, t in enumerate(tspan):
r = np.random.rand(1)
if r > a:
c = sess.run(out_t, feed_dict={r_placeholder: r})
else:
c = 0.0
B[i] = c
return B
You could potentially make this even more efficient by using a TensorFlow loop and making fewer calls to sess.run(), but the general principle is the same: reuse same the graph multiple times to get the benefit of TensorFlow.
I'm trying to write my own cost function in tensor flow, however apparently I cannot 'slice' the tensor object?
import tensorflow as tf
import numpy as np
# Establish variables
x = tf.placeholder("float", [None, 3])
W = tf.Variable(tf.zeros([3,6]))
b = tf.Variable(tf.zeros([6]))
# Establish model
y = tf.nn.softmax(tf.matmul(x,W) + b)
# Truth
y_ = tf.placeholder("float", [None,6])
def angle(v1, v2):
return np.arccos(np.sum(v1*v2,axis=1))
def normVec(y):
return np.cross(y[:,[0,2,4]],y[:,[1,3,5]])
angle_distance = -tf.reduce_sum(angle(normVec(y_),normVec(y)))
# This is the example code they give for cross entropy
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
I get the following error:
TypeError: Bad slice index [0, 2, 4] of type <type 'list'>
At present, tensorflow can't gather on axes other than the first - it's requested.
But for what you want to do in this specific situation, you can transpose, then gather 0,2,4, and then transpose back. It won't be crazy fast, but it works:
tf.transpose(tf.gather(tf.transpose(y), [0,2,4]))
This is a useful workaround for some of the limitations in the current implementation of gather.
(But it is also correct that you can't use a numpy slice on a tensorflow node - you can run it and slice the output, and also that you need to initialize those variables before you run. :). You're mixing tf and np in a way that doesn't work.
x = tf.Something(...)
is a tensorflow graph object. Numpy has no idea how to cope with such objects.
foo = tf.run(x)
is back to an object python can handle.
You typically want to keep your loss calculation in pure tensorflow, so do the cross and other functions in tf. You'll probably have to do the arccos the long way, as tf doesn't have a function for it.
just realized that the following failed:
cross_entropy = -tf.reduce_sum(y_*np.log(y))
you cant use numpy functions on tf objects, and the indexing my be different too.
I think you can use "Wraps Python function" method in tensorflow. Here's the link to the documentation.
And as for the people who answered "Why don't you just use tensorflow's built in function to construct it?" - sometimes the cost function people are looking for cannot be expressed in tf's functions or extremely difficult.
This is because you have not initialized your variable and because of this it does not have your Tensor there right now (can read more in my answer here)
Just do something like this:
def normVec(y):
print y
return np.cross(y[:,[0,2,4]],y[:,[1,3,5]])
t1 = normVec(y_)
# and comment everything after it.
To see that you do not have a Tensor now and only Tensor("Placeholder_1:0", shape=TensorShape([Dimension(None), Dimension(6)]), dtype=float32).
Try initializing your variables
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
and evaluate your variable sess.run(y). P.S. you have not fed your placeholders up till now.