I want to make a cross validation in my project based on Pytorch.
And I didn't find any method that pytorch provided to delete the current model and empty the memory of GPU. Could you tell that how can I do it?
Freeing memory in PyTorch works as it does with the normal Python garbage collector. This means once all references to an Python-Object are gone it will be deleted.
You can delete references by using the del operator:
del model
You have to make sure though that there is no reference to the respective object left, otherwise the memory won't be freed.
So once you've deleted all references of your model, it should be deleted and the memory freed.
If you want to learn more about memory management you can take a look here:
https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management
Related
Can I Use Tensorflow object detection API for detecting any objects which come in between my path so that can stop the movement of my product? I have done customized Object detections before but here I can't train each object which may interrupt my product path. So is that possible to use Tensorflow API as a kind of collision detection?
With object detection, you can identify objects and the object's location and extent on an image. This would be an option to check if specific objects are blocking your path. There is also the option of detecting/segmenting unknown objects (as described here). However, what you are after sounds more like depth estimation or even SLAM.
One example for depth estimation is monodepth - a neural network that can estimate the depth for each pixel from a single camera image. You can use that to verify if your path is clear or if something in front of your product is blocking the path.
The other one SLAM - simultaneous location and mapping - might be a bit over the top for just checking if you can drive somewhere. Anyways, SLAM solves the task of navigating an unknown environment by building an internal model of the world and at the same time estimates the own location inside this model to solve navigation tasks.
The model I have contains huge number of agents. However, I wish to partially delete/elminate some agents who have done their job during the run-time, in order to release computing memory, speed up model execution and avoid OOM.
Is context.remove() really eliminates/kills the agent(object) permanently? Is memory released after this operation? If not, what is the correct procedure?
Yes, that's right. Unless you have some other reference to the agent, removing it from the context will allow the memory to be garbage collected.
(I have posted the question on https://github.com/tensorflow/federated/issues/793 and maybe also here!)
I have customized my own data and model to federated interfaces and the training converged. But I am confused about an issue that in an images classification task, the whole dataset is extreme large and it can't be stored in a single federated_train_data nor be imported to memory for one time. So I need to load the dataset from the hard disk in batches to memory real-timely and use Keras model.fit_generator instead of model.fit during training, the approach people use to deal with large data.
I suppose in iterative_process shown in image classification tutorial, the model is fitted on a fixed set of data. Is there any way to adjust the code to let it fit to a data generator?I have looked into the source codes but still quite confused. Would be incredibly grateful for any hints.
Generally, TFF considers the feeding of data to be part of the "Python driver loop", which is a helpful distinction to make when writing TFF code.
In fact, when writing TFF, there are generally three levels at which one may be writing:
TensorFlow defining local processing (IE, processing that will happen on the clients, or on the server, or in the aggregators, or at any other placement one may want, but only a single placement.
Native TFF defining the way data is communicated across placements. For example, writing tff.federated_sum inside of a tff.federated_computation decorator; writing this line declares "this data is moved from clients to server, and aggregated via the sum operator".
Python "driving" the TFF loop, e.g. running a single round. It is the job of this final level to do what a "real" federated learning runtime would do; one example here would be selecting the clients for a given round.
If this breakdown is kept in mind, using a generator or some other lazy-evaluation-style construct to feed data in to a federated computation becomes relatively simple; it is just done at the Python level.
One way this could be done is via the create_tf_dataset_for_client method on the ClientData object; as you loop over rounds, your Python code can select from the list of client_ids, then you can instantiate a new list of tf.data.Datasetsand pass them in as your new set of client data. An example of this relatively simple usage would be here, and a more advanced usage (involving defining a custom client_datasets_fn which takes client_id as a parameter, and passing it to a separately-defined training loop would be here, in the code associated to this paper.
One final note: instantiating a tf.data.Dataset does not actually load the dataset into memory; the dataset is only loaded in when it is iterated over. One helpful tip I have received from the lead author of tf.data.Dataset is to think of tf.data.Dataset more as a "dataset recipe" than a literal instantiation of the dataset itself. It has been suggested that perhaps a better name would have been DataSource for this construct; hopefully that may help the mental model on what is actually happening. Similarly, using the tff.simulation.ClientData object generally shouldn't really load anything into memory until it is iterated over in training on the clients; this should make some nuances around managing dataset memory simpler.
as a part of my R&D I was given access to multiple TPUs, but I can't find documentation how to allocate them together for my training purposes, both node-wise and code-wise. The documentation said ctpu up -zone MY_ZONE_CHOICE but this command allocates only single TPU. And, similar what changes should I add to my code if I want to use multiple TPUs? So far I've used this call tf.contrib.cluster_resolver.TPUClusterResolver() to check for TPU, what should be changed (if any) to check if I can access multiple TPUs?
I'm often using the following pattern for managing control flow:
with tf.get_default_graph().control_dependencies([c_op]):
h_state = tf.identity(h_state)
However, I'm concerned that tf.identity() might copy the data passed to it which is not what I want. Can somebody confirm that it does or does not create a copy?
The implementation of the tf.identity() operation will forward its input to its output without making a deep copy. However, if the tf.identity() operation is pinned to a different device from the operation that produces its input, a deep copy will occur.