I am using TF as a backend to Keras. I am using custom loss functions so I essentially use Keras as a wrapper for TF. I have a big model, which consists of 4 smaller ones, where 3 of them are pre-trained and loaded while the fourth gets trained.
The issue is, that when calling
self.session.run(tf.global_variables_initializer())
TF ends up with an error of trying to allocate too much memory on the GPU. The model itself has around 280 000 000 params (70 mil are trainable), and the TF graph has 1 000 000 000 variables. That's where the math doesn't add up.
Allocating 1 billion floats should take up around 4 GB of memory. And TF has 5.3 GB of VRAM available. The 1 billion variables should afaik include all stored activations and gradients and optimizer params (1 per trained param, using rmsprop).
There are very few activations because I only use quite small conv layers, so the activations for the whole thing per 1 sample should take around 6.5 MB and I'm using batch size of only 32, so 208 MB total.
Do you have any idea what's going on here? Does the model just barely not fit or is there a bigger problem somewhere?
Any advice appreciated!
EDIT: The model definition code: https://pastebin.com/6FRczTc0 (the first function is used for the 4 submodels and the second one puts them together into the bigger net)
Related
I'm trying out a simple sequential model with the below dataset.
Using Colab PRO with 35 GB RAM + 225 GB Disk space.
Total sentences - 59000
Total words - 160000
Padded seq length - 38
So train_x (59000,37), train_y (59000)
I'm using FastText for embedding layer. FastText model generated weights with
(rows) vocab_size 113000
(columns/dimentionality) embedding_size 8815
Here is how the model.summary() looks like
It takes about ~15 mins to compile the model but .fit crashes without adequate memory.
I've brought down the batch_size to 4 (vs 32 default).. still no luck.
epochs=2
verbose=0
batch_size=4
history = seq_model.fit(train_x,train_y, epochs=epochs, verbose=verbose,callbacks=[csv_logger],batch_size=batch_size)
Appreciate any ideas to make this work.
If what I am seeing is right, your model is simply too large!
It has almost 1.5 billion parameters. That's way too much.
Reducing the batch size will not help at all.
Apologies if my questions are relatively simple, but I have been approaching the TensorFlow bit recently with the aim to learn new skills.
In the example, but there are several things I can't get:
in the explore data section, the size of the datasets return as 60/10k respectively for train and test.
where the size of the train/test size declared?
packages like SkLearn allows this to be specified in percentage when invoking the split methods.
in the training model part, when the 5 epochs are trained, the 1875 number appear below.
- what is that?
- I was expecting the training to run over the 60k items, but even by multiplying 1875 by 5 the number doesn't reach the 10k.
Dataset is loaded using tensorflow datasets API
The source itself has the split of 60K (Train) and 10K (Test)
https://www.tensorflow.org/datasets/catalog/fashion_mnist
An Epoch is a complete run with all the training samples. The training is done in batches. In the example you refer to, a batch size of 32 is used. So to complete one epoch, 1875 batches (60000 / 32) are run.
Hope this helps.
I'm trying to train a YOLOv3 model for 62 classes using https://github.com/wizyoung/YOLOv3_TensorFlow.
How many samples should I take for each class.
I'm using a Nvidia GTX 1050Ti GPU so what should be my batch size with each image of 300*300 size?
Is 80-20 train/test split ideal?
The 80-20% train-test(val) split is dependent on the number of samples, not on the number of classes. The more data you have, the bigger the discrepancy percentage between train and test(val) it can be (for millions of samples data you can have 95%---5% split)
Normally, at least (minimum) number of 200 bounding_boxes_annotations per object should be present. That is, each of your classes should have at least 200 annotations.
1050Ti has only 4GB VRAM. Depending on your image_size, you can increase or decrease the batch_size. However, take into consideration that you do not have very much VRAM available, most likely(decrease it to 1 if you have OOM issues) a batch_size of 2 for images of 300x300 will be the maximum you can achieve.
I've got a 1 layer LSTM model in tensorflow and the temperature reading of my GPU gets rather high during the training phase. Always varying between 80 C and 90 C. My GPU is a water cooled gtx 1080 "Super-clocked" edition in a 24/7 refrigerated room. The model works, but this temperature worries me. I'd like to know if this is normal and safe.
I'm training the LSTM for a next-word-prediction problem with tokenized reddit comments. I got the idea from different tutorials in wildml.com. Here are some details about it:
Tensorflow 1.2.1, Cuda tk 8.0, Cudnn 6.0, Nvidia Driver 375.66
My training data consists of 200 K reddit comments.
My word dictionary consists of 8000 words, which means 8000 classes of classification for each prediction
I use GLOVE pre-trained 100 Dimensions embeddings of Wikipedia words
I'm not using placeholders to feed my input. It's all done with TFRecordfiles readers, which input the examples to a 100k capacity random shuffle queue
From the random shuffle queue, it goes to a padding FIFO queue, where I generated zero-paddaded mini-batches of 20
The 20 size mini batches go to a tf.dynamic_rnn() with LSTM cell with Hidden dimension of 150
I mask the losses using tf.sign() and minimize the result with Adam optimizer
I've noticed that the temperature rises a lot when I raise the mini-batch size. 1 size mini-batches (single examples), it reads between 72-75 C. With 10 size mini-batches, it immediately goes to 78 C and stays in the range of 78-84 C. With 20 size mini-batches, 84-88 C. With 30 size mini-batches, 87-92 C.
If I raise the hidden dimension to 200, 250, 300, etc, while maintaining the minibatch size fixed, I also get similar temperature raises.
I've also trained the same model, but feeding the data with placeholders only, i.e, not using TFRecord, Queues and mini-batches. It stays around 65 C, but it's obviously far from optimized and ideal to use placeholders for feeding the net.
I really appreciate your help, I'm kinda desperate, to be honest.
-----------------EDIT---------------------
It turns out the water cooler pump was configured on my bios to variate according to the CPU temp...Obviously the GPU temp wouldn't affect it and thats what happened. It was running on 50 % of its capacity. Well, I've ajusted it to stay 100% all the time and now the same model runs with max temp of approx. 83 C. Still not perfect, but a huge improvement. I guess that with the complexity of my model + the really high 1.8 GHz clock of my GPU there's not much I can do.
The maximum design temperature of the GTX 1080 according to nvidia is 94 C. Anything below that and you should be safe.
Maximum GPU Temperature (in C) 94
The fact that the GPU temperature rises when you raise the mini-batch sizes is a good sign, this means that your GPU is working as hard as it can. In fact, if your GPU is not at ~80-90 C, this means that it is not working at full power, and you are losing some performance.
I have been trying to use Google's RNN based seq2seq model.
I have been training a model for text summarization and am feeding in a textual data approximately of size 1GB. The model quickly fills up my entire RAM(8GB), starts filling up even the swap memory(further 8GB) and crashes post which I have to do a hard shutdown.
The configuration of my LSTM network is as follows:
model: AttentionSeq2Seq
model_params:
attention.class: seq2seq.decoders.attention.AttentionLayerDot
attention.params:
num_units: 128
bridge.class: seq2seq.models.bridges.ZeroBridge
embedding.dim: 128
encoder.class: seq2seq.encoders.BidirectionalRNNEncoder
encoder.params:
rnn_cell:
cell_class: GRUCell
cell_params:
num_units: 128
dropout_input_keep_prob: 0.8
dropout_output_keep_prob: 1.0
num_layers: 1
decoder.class: seq2seq.decoders.AttentionDecoder
decoder.params:
rnn_cell:
cell_class: GRUCell
cell_params:
num_units: 128
dropout_input_keep_prob: 0.8
dropout_output_keep_prob: 1.0
num_layers: 1
optimizer.name: Adam
optimizer.params:
epsilon: 0.0000008
optimizer.learning_rate: 0.0001
source.max_seq_len: 50
source.reverse: false
target.max_seq_len: 50
I tried decreasing the batch size from 32 to 16, but it still did not help. What specific changes should I make in order to prevent my model from taking up the entirety of RAM and crashing? (Like decreasing data size, decreasing number of stacked LSTM cells, further decreasing batch size etc)
My system runs Python 2.7x, TensorFlow version 1.1.0, and CUDA 8.0. The system has an Nvidia Geforce GTX-1050Ti(768 CUDA cores) with 4GB of memory, and the system has 8GB of RAM and a further 8GB of swap memory.
You model looks pretty small. The only thing kind of big is the train data. Please check to make sure your get_batch() function has no bugs. It is possible that each batch you are actually loading the whole data set for training, in case there is a bug there.
In order to quickly prove this, just cut down your training data size to something very small (such as 1/10 of current size) and see if that helps. Note that it should not help because you are using mini batch. But if that resolve the problem, fix your get_batch() function.