Specifically, how to train neural network when it is larger than ram? - tensorflow

I have specific questions about how to train a neural network that is larger than ram. I want to use the de facto standard which appears to be Keras and tensorflow.
What are the key classes and methods that I need to use
From Numpy, to scipy, to pandas, h5py, to keras in order to not exceed my meager 8 gb of ram? I have time to train the model; I don't have cash. My dataset requires 200 GB of ram.
In keras there is a model_fit() method. It requires X and Y numpy arrays. How do I get it to accept hdf5 numpy arrays on disk? And when specifying the model architecture itself How do I save ram because wouldn't the working memory require > 8 gb at times?
Regarding fit_generator, does that accept hdf5 files? If the model_fit() method can accept hdf5, do I even need fit generator? It seems that you still need to be able to fit the entire model in ram even with these methods?
In keras does the model include the training data when calculating its memory requirements? If so I am in trouble I think.
In essence I am under the assumption that at no time can I exceed my 8 Gb of ram, whether from one hot encoding to loading the model to training on even a small batch of samples. I am just not sure how to accomplish this concretely.

I cannot answer everything, and I'm also very interested in those answers, because I'm facing that 8GB problem too.
I can only suggest how to pass little batches at a time.
Question 2:
I don't think Keras will support passing the h5py file (but I really don't know), but you can create a loop to load the file partially (if the file is properly saved for that).
You can create an outer loop to:
create a little array with only one or two samples from the file
use the method train_on_batch passing only that little array.
release the memory disposing of the array or filling this same array with the next sample(s).
Question 3:
Also don't know about the h5py file, is the object that opens the file a python generator?
If not, you can create the generator yourself.
The idea is to make the generator load only part of the file and yield little batch arrays with one or two data samples. (Pretty much the same as done in question 2, but the loop goes inside a generator.

Usually for very large sample sets an "online" training method is used. This means that instead of training your neural network in one go with a large batch, it allows the neural network to be updated incrementally as more samples are obtained. See: Stochastic Gradient Descent

Related

What happens if I train a large deep learning model using a graphics card with insufficient VRAM?

This is just a general question rather than a specific issue.
For an upcoming school assignment I'll need to use OpenCV. From what the professor told me an OpenCV model can take up to 8GB memory, but my graphics card (GTX 960) has only 2GB of VRAM. What will happen if I try training a model larger than 2GB? Can Tensorflow make use of my CPU memory to store the model?
CV2 is typically used to do some processing on images, like reading them in or resizing them. The model is independent of cv2 except where cv2 is used to input and preprocess the images and prepare them as an input to the model. The amount of memory you will use depends on the image sizes and on the model you build. To avoid requiring a lot of memory you typically do not want to put all your images into memory at once but instead process them in batches. Keras provides an number of ways to do that. Documentation for that is located here.

Keras alternative to ImageDataGenerator for loading arbitrary numpy tensor

The Keras's ImageDataGenerator looks great for simply progressively loading images and passing an iterator to the model.fit function. However, it seems to be only usable for images and for classification tasks.
I want to do regression, i.e., my labels are also arrays of the same shape as the training set ones. In practice, they are multidimensional (>1 channels) arrays like images but they are not images.
Any suggestions on what class to use to simply spit batches of data to a keras model.fit() for training a deep neural net?
The problem, of course, is that my datasets are much too large to fit in memory, which is why I need to use these generators/iterators.
The best solution for your case is to use tf.data.Dataset().
While it may take a relatively short time to accustom to it, it is the recommended way to load your data and use model.fit().
You can consult the documentation here: https://www.tensorflow.org/api_docs/python/tf/data/Dataset
Is is new, fast, beautifully designed and easily extensible.
For instance, for your problem you may want to use tf.data.Dataset.from_tensor_slices(); I will leave you discover its features :D.
A quick solution would be to use Colab whose GPU instance has got 24 GB RAM to work with . You could also reduce your memory when you load the numpy array like the way I did here

can tf alone train a 20 million-plus rows dataset?

I wanna train a model in tf using a dataset of more than 20 million rows. Are there any limitations/errors that may happen when performing this? Are there any methods/techniques I could try to effectively perform this?. The problem is simple classification one but I've never trained with such a large dataset. Any advice would be helpful. Thanks
TensorFlow can handle petabytes of information passed across tens of thousands of GPUs - the question is, does your code manage resources properly, and can your hardware handle it? This is called distributed training. The topic is very broad, but you can get started with setting up a GPU - that includes installing CUDA & cuDNN. You can also refer to input data pipeline optimization.
I suggest handling all your installs via Anaconda 3, as it handles package compatibility - here's a guide or two to get started.
Lastly, your main hardware constraints are RAM and GPU memory; former for the maximum array size a model can process (e.g. 8GB), and latter for maximum model size the GPU can fit.

Tensorflow out of memory

I am using tensorflow to build CNN based text classification. Some of the datasets are large and some are small.
I use feed_dict to feed the network by sampling data from system memory (not GPU memory). The network is trained batch by batch. The batch size is 1024 fixed for every dataset.
My question is:
The network is trained by batches, and each batch the code retrieve data from system memory. Therefore, no matter how large the dataset is the code should handle it like the same, right?
But I got out of memory problem with large dataset, and for small dataset it works fine. I am pretty sure the system memory is enough for holding all the data. So the OOM problem is about tensorflow, right?
Is it that I write my code wrong, or is it about tensorflow's memory management?
Thanks a lot!
I think your batch size is way too big with 1024. There is a lot of matrices overhead created, especially if you use AgaGrad Adam and the like, dropout, attention and/or more. Try smaller values, like 100, as batchsize. Should solve and train just fine.

TensorFlow: How to measure how much GPU memory each tensor takes?

I'm currently implementing YOLO in TensorFlow and I'm a little surprised on how much memory that is taking. On my GPU I can train YOLO using their Darknet framework with batch size 64. On TensorFlow I can only do it with batch size 6, with 8 I already run out of memory. For the test phase I can run with batch size 64 without running out of memory.
I am wondering how I can calculate how much memory is being consumed by each tensor? Are all tensors by default saved in the GPU? Can I simply calculate the total memory consumption as the shape * 32 bits?
I noticed that since I'm using momentum, all my tensors also have a /Momentum tensor. Could that also be using a lot of memory?
I am augmenting my dataset with a method distorted_inputs, very similar to the one defined in the CIFAR-10 tutorial. Could it be that this part is occupying a huge chunk of memory? I believe Darknet does the modifications in the CPU.
Now that 1258 has been closed, you can enable memory logging in Python by setting an environment variable before importing TensorFlow:
import os
os.environ['TF_CPP_MIN_VLOG_LEVEL']='3'
import tensorflow as tf
There will be a lot of logging as a result of this. You'll want to grep the results to find the appropriate lines. For example:
grep MemoryLogTensorAllocation train.log
Sorry for the slow reply. Unfortunately right now the only way to set the log level is to edit tensorflow/core/platform/logging.h and recompile with e.g.
#define VLOG_IS_ON(lvl) ((lvl) <= 1)
There is a bug open 1258 to control logging more elegantly.
MemoryLogTensorOutput entries are logged at the end of each Op execution, and indicate the tensors that hold the outputs of the Op. It's useful to know these tensors since the memory is not released until the downstream Op consumes the tensors, which may be much later on in a large graph.
See the description in this (commit).
The memory allocation is raw info is there although it needs a script to collect the information in an easy to read form.