how to free memory from mx.ndarray - mxnet

I write a custom op in Python layer. implemented with operators of mx.nd.op_name, It work normal when the shape of input is in the same. But it tells out of memory when its' shape is different.the custom op as below. It seems like the memory of self.output haven't be free when forward function is done. and i try to del self.output, but it don't work. can you provided some suggestions?
def forward(...):
self.output = mx.nd.op_name(inputs)
do others
def backward(...):
do backward

Related

Tensorflow delete graph and free up resources

I create a tensorflow graph and define some tensors and run some stuff. When I'm done, I'd like to delete the graph that I made, and free up all of the resources. How can I do that thing?
temporary_graph = tf.Graph()
with temporary_graph.as_default(), tf.Session() as sess:
foo = tf.placeholder(tf.float32, (2,2))
bar = foo#foo
res = sess.run(bar, feed_dict={foo: np.ones((2,2))})
print(res)
delete_graph_and_free_up_resources(temporary_graph)
This answer claims that the context manager cleans up the graph, but this isn't the case, and the docs don't claim such a thing:
>>> temporary_graph.get_operations()
[<tf.Operation 'Placeholder' type=Placeholder>, <tf.Operation 'matmul' type=MatMul>]
What is the best way to dispose of a graph?
It is not so simple, in order to free the resources that a graph is using you need to lose every reference to that graph, so Python can request to have it deleted from memory. That means deleting direct references to the graph, but also objects referencing the graph (and transitively). That includes operations, tensors and sessions, among other things. In your example, you would need to do:
del temporary_graph, sess, foo, bar, res
And that should make it possible to have the memory freed (not sure if you might need to call the garbage collector in some cases).
As you may not, you can not do this in a function, as it depends on the live references in your program. However, if you keep all references related to the graph within a function or object you should be able to do it fine.
I'm using tensorflow keras, but my approach is to simply clear the session:
tensorflow.keras.backend.clear_session()

Kernel's hyper-parameters; initialization and setting bounds

I think many other people like me might be interested in how they can use GPFlow for their special problems. The key is how GPFlow is customizable, and a good example would be very helpful.
In my case, I read and tried lots of comments in raised issues without any real success. Setting kernel model parameters is not straightforward (creating with default values, and then do it via the delete object method). Transform method is vague.
It would be really helpful if you could add an example showing. how one can initialize and set bounds of an anisotropic kernel model (length-scales values and bounds, variances, ...) and specially adding observations error (as an array-like alpha parameter)
If you just want to set a value, then you can do
model = gpflow.models.GPR(np.zeros((1, 1)),
np.zeros((1, 1)),
gpflow.kernels.RBF(1, lengthscales=0.2))
Alternatively
model = gpflow.models.GPR(np.zeros((1, 1)),
np.zeros((1, 1)),
gpflow.kernels.RBF(1))
model.kern.lengthscales = 0.2
If you want to change the transform, you either need to subclass the kernel, or you can also do
with gpflow.defer_build():
model = gpflow.models.GPR(np.zeros((1, 1)),
np.zeros((1, 1)),
gpflow.kernels.RBF(1))
transform = gpflow.transforms.Logistic(0.1, 1.))
model.kern.lengthscales = gpflow.params.Parameter(0.3, transform=transform)
model.compile()
You need the defer_build to stop the graph being compiled before you've changed the transform. Using the approach above, the compilation of the tensorflow graph is delayed (until the explicit model.compile()) so is built with the intended bounding transform.
Using an array parameter for likelihood variance is outside the scope of gpflow. For what it's worth (and because it has been asked about before), that particular model is especially problematic as it is not clear how test points are defined.
Setting kernel parameters can be done using the .assign() function, or through direct assignment. See the notebook https://github.com/GPflow/GPflow/blob/develop/doc/source/notebooks/understanding/tf_graphs_and_sessions.ipynb. You do not need to delete a parameter to assign a new value to it.
If you want to have per-datapoint noise, you will need to implement your own custom likelihood, which you can do by taking Gaussian likelihood in likelihoods.py as an example.
If by "bounds" you mean limiting the optimisation range for a parameter, you can use the Logistic transform. If you want to pass in a custom transformation for a parameter, you can pass a constructed Parameter object into constructors with a custom transform. Alternatively you can assign a newly created Parameter with a new transform to the model.
Here is more information on how to access and change GPflow parameters: viewing, getting and settings parameters documentation.
Extra bit for #user1018464 answer about replacing transform in existing parameter: changing transformation is a bit tricky, you can't change transformation once a model was compiled in TensorFlow.
E.g.
likelihood = gpflow.likelihoods.Gaussian()
likelihood.variance.transform = gpflow.transforms.Logistic(1., 10.)
----
GPflowError: Parameter "Gaussian/variance" has already been compiled.
Instead you have to reset GPflow object:
likelihood = gpflow.likelihoods.Gaussian() # All tensors compiled
likelihood.clear()
likelihood.variance.transform = gpflow.transforms.Logistic(2, 5)
likelihood.variance = 2.5
likelihood.compile()

Custom op backward

I am writing a custom op, and I got stucked when writing the backward part.
When I call out_grad[0].asnumpy() or do any manipulation of the out_grad, the program crash without any error message.
I tried fill the in_grad with zeros, the program run smoothly, but I need the grad to flow backward.
def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
self.assign(in_grad[0], req[0], 0)
self.assign(in_grad[1], req[1], 0)
What's going wrong here?
Custom Operator in MXNet show us how to define a loss function using custom op. The loss op is very special because it doesn't need grad to be flow into.
But in my situation, I need grad to flow into my op. So, the function below should return the dependency instead of empty as in loss op.
def declare_backward_dependency(self, out_grad, in_data, out_data):
return [out_grad[0]]
In my opinion, the dependency is some variable which the gradient should be delievered to.
Have you tried to follow the tutorial here for developing a
Custom Operator in MXNet.
If that does not help, provide your full code of the Custom operator along with some sample data and a simple model with which this issue can be easily reproduced.

Force copy of tensor when enqueuing

first, I'm not sure if the title is very good, but it was the best I could come up with given my understanding of the situation.
The background is that I'm trying to understand how queues work in tensorflow and ran into the following issue which puzzled me.
I have a variable n, which I enqueue to a tf.FIFOQueue, and then I increment the variable. This is repeated several times, and one would expect a result similar to 0, 1, 2, ... However, when emptying the queue all values are the same.
More precisely, the code is as follows:
from __future__ import print_function
import tensorflow as tf
q = tf.FIFOQueue(10, tf.float32)
n = tf.Variable(0, trainable=False, dtype=tf.float32)
inc = n.assign(n+1)
enqueue = q.enqueue(n)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
sess.run(enqueue)
sess.run(inc)
sess.run(enqueue)
sess.run(inc)
sess.run(enqueue)
sess.run(inc)
print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))
Which I expect would print:
0.0
1.0
2.0
Instead I get the following result:
3.0
3.0
3.0
It seems like I'm pushing some pointer to n to the queue, instead of the actual value, which is what I want. However, I don't really have any actual understanding of tensorflow internals, so maybe something else is going on?
I tried changing
enqueue = q.enqueue(n)
to
enqueue = q.enqueue(tf.identity(n))
since answers to How can I copy a variable in tensorflow and In TensorFlow, what is tf.identity used for? gives me the impression that it might help, but it does not change the result. I also tried adding a tf.control_dependencies(), but again, all values are the same when dequeueing.
Edit: The output above is from running the code on a computer with a single CPU, when trying to see if there was some difference between different versions of tensorflow, I noticed if I run the code on a computer with CPU and GPU I get the "expected" result. Indeed, if I run with CUDA_VISIBLE_DEVICES="" I get the result above, and with CUDA_VISIBLE_DEVICES="0" I get the "expected" result.
To force a non-caching read you can do
q.enqueue(tf.add(q, 0))
This is what's currently done by the batch-normalization layer to force a copy.
Semantics of how variables get read vs. referenced are in the process of getting revamped so they are temporarily non-intuitive. In particular, I expected q.enqueue(v.read_value()) to force a non-caching read, but it doesn't fix your example on TF 0.12rc1
Using GPU machine puts variable on GPU, while Queue is CPU only, so enqueue op forces a GPU->CPU copy.
In case it helps, I've found that the other answers despite correct they do not work for all dtypes.
For example, this works fine with floats or ints but fails when n is a string tensor:
q.enqueue(tf.add(n, 0))
This one fails when the queue uses tuples with heterogeneous types (e.g., ints and floats):
q.enqueue_many([[n]])
So, if you see yourself caught in any of these situations try this instead:
q.enqueue(tf.add(n, tf.zeros_like(n)))
Or, to enqueue a tuple t:
q.enqueue([tf.add(n, tf.zeros_like(n)) for n in t])
That works even for string tensors and heterogeneous tuple types.
Hope it helps!
--
Update: it looks like tf.bool types do not work with tf.zeros_like(). For those, an explicit cast to an integer type might be needed.

How to assign values to a subset of a tensor in tensorflow?

Two parts to this question:
(1) What is the best way to update a subset of a tensor in tensorflow? I've seen several related questions:
Adjust Single Value within Tensor -- TensorFlow
and
How to update a subset of 2D tensor in Tensorflow?
and I'm aware that Variable objects can be assigned using Variable.assign() (and/or scatter_update, etc.), but it seems very strange to me that tensorflow does not have a more intuitive way to update a part of a Tensor object. I have searched through the tensorflow api docs and stackoverflow for quite some time now and can't seem to find a simpler solution than what is presented in the links above. This seems particularly odd, especially given that Theano has an equivalent version with Tensor.set_subtensor(). Am I missing something or is there no simple way to do this through the tensorflow api at this point?
(2) If there is a simpler way, is it differentiable?
Thanks!
I suppose the immutability of Tensors is required for the construction of a computation graph; you can't have a Tensor update some of its values without becoming another Tensor or there will be nothing to put in the graph before it. The same issue comes up in Autograd.
It's possible to do this (but ugly) using boolean masks (make them variables and use assign, or even define them prior in numpy). That would be differentiable, but in practice I'd avoid having to update subtensors.
If you really have to, and I really hope there is a better way to do this, but here is a way to do it in 1D using tf.dynamic_stitch and tf.setdiff1d:
def set_subtensor1d(a, b, slice_a, slice_b):
# a[slice_a] = b[slice_b]
a_range = tf.range(a.shape[0])
_, a_from = tf.setdiff1d(a_range, a_range[slice_a])
a_to = a_from
b_from, b_to = tf.range(b.shape[0])[slice_b], a_range[slice_a]
return tf.dynamic_stitch([a_to, b_to],
[tf.gather(a, a_from),tf.gather(b, b_from)])
For higher dimensions this could be generalised by abusing reshape (where nd_slice could be implemented like this but there is probably a better way):
def set_subtensornd(a, b, slice_tuple_a, slice_tuple_b):
# a[*slice_tuple_a] = b[*slice_tuple_b]
a_range = tf.range(tf.reduce_prod(tf.shape(a)))
a_idxed = tf.reshape(a_range, tf.shape(a))
a_dropped = tf.reshape(nd_slice(a_idxed, slice_tuple_a), [-1])
_, a_from = tf.setdiff1d(a_range, a_dropped)
a_to = a_from
b_range = tf.range(tf.reduce_prod(tf.shape(b)))
b_idxed = tf.reshape(b_range, tf.shape(b))
b_from = tf.reshape(nd_slice(b_idxed, slice_tuple_b), [-1])
b_to = a_dropped
a_flat, b_flat = tf.reshape(a, [-1]), tf.reshape(b, [-1])
stitched = tf.dynamic_stitch([a_to, b_to],
[tf.gather(a_flat, a_from),tf.gather(b_flat, b_from)])
return tf.reshape(stitched, tf.shape(a))
I have no idea how slow this will be. I'd guess quite slow. And, I haven't tested it much beyond running it on a couple of tensors.