GPU Memory Spiking in Keras [duplicate] - tensorflow
I work in an environment in which computational resources are shared, i.e., we have a few server machines equipped with a few Nvidia Titan X GPUs each.
For small to moderate size models, the 12 GB of the Titan X is usually enough for 2–3 people to run training concurrently on the same GPU. If the models are small enough that a single model does not take full advantage of all the computational units of the GPU, this can actually result in a speedup compared with running one training process after the other. Even in cases where the concurrent access to the GPU does slow down the individual training time, it is still nice to have the flexibility of having multiple users simultaneously train on the GPU.
The problem with TensorFlow is that, by default, it allocates the full amount of available GPU memory when it is launched. Even for a small two-layer neural network, I see that all 12 GB of the GPU memory is used up.
Is there a way to make TensorFlow only allocate, say, 4 GB of GPU memory, if one knows that this is enough for a given model?
You can set the fraction of GPU memory to be allocated when you construct a tf.Session by passing a tf.GPUOptions as part of the optional config argument:
# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
The per_process_gpu_memory_fraction acts as a hard upper bound on the amount of GPU memory that will be used by the process on each GPU on the same machine. Currently, this fraction is applied uniformly to all of the GPUs on the same machine; there is no way to set this on a per-GPU basis.
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
https://github.com/tensorflow/tensorflow/issues/1578
For TensorFlow 2.0 and 2.1 (docs):
import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth(True)
For TensorFlow 2.2+ (docs):
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
The docs also list some more methods:
Set environment variable TF_FORCE_GPU_ALLOW_GROWTH to true.
Use tf.config.experimental.set_virtual_device_configuration to set a hard limit on a Virtual GPU device.
Here is an excerpt from the Book Deep Learning with TensorFlow
In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as it is needed by the process. TensorFlow provides two configuration options on the session to control this. The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations, it starts out allocating very little memory, and as sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process.
1) Allow growth: (more flexible)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
The second method is per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. Note: No release of memory needed, it can even worsen memory fragmentation when done.
2) Allocate fixed memory:
To only allocate 40% of the total memory of each GPU by:
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
Note:
That's only useful though if you truly want to bind the amount of GPU memory available on the TensorFlow process.
For Tensorflow version 2.0 and 2.1 use the following snippet:
import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpu_devices[0], True)
For prior versions , following snippet used to work for me:
import tensorflow as tf
tf_config=tf.ConfigProto()
tf_config.gpu_options.allow_growth=True
sess = tf.Session(config=tf_config)
All the answers above assume execution with a sess.run() call, which is becoming the exception rather than the rule in recent versions of TensorFlow.
When using the tf.Estimator framework (TensorFlow 1.4 and above) the way to pass the fraction along to the implicitly created MonitoredTrainingSession is,
opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
trainingConfig = tf.estimator.RunConfig(session_config=conf, ...)
tf.estimator.Estimator(model_fn=...,
config=trainingConfig)
Similarly in Eager mode (TensorFlow 1.5 and above),
opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
tfe.enable_eager_execution(config=conf)
Edit: 11-04-2018
As an example, if you are to use tf.contrib.gan.train, then you can use something similar to bellow:
tf.contrib.gan.gan_train(........, config=conf)
You can use
TF_FORCE_GPU_ALLOW_GROWTH=true
in your environment variables.
In tensorflow code:
bool GPUBFCAllocator::GetAllowGrowthValue(const GPUOptions& gpu_options) {
const char* force_allow_growth_string =
std::getenv("TF_FORCE_GPU_ALLOW_GROWTH");
if (force_allow_growth_string == nullptr) {
return gpu_options.allow_growth();
}
Tensorflow 2.0 Beta and (probably) beyond
The API changed again. It can be now found in:
tf.config.experimental.set_memory_growth(
device,
enable
)
Aliases:
tf.compat.v1.config.experimental.set_memory_growth
tf.compat.v2.config.experimental.set_memory_growth
References:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/config/experimental/set_memory_growth
https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth
See also:
Tensorflow - Use a GPU: https://www.tensorflow.org/guide/gpu
for Tensorflow 2.0 Alpha see: this answer
All the answers above refer to either setting the memory to a certain extent in TensorFlow 1.X versions or to allow memory growth in TensorFlow 2.X.
The method tf.config.experimental.set_memory_growth indeed works for allowing dynamic growth during the allocation/preprocessing. Nevertheless one may like to allocate from the start a specific-upper limit GPU memory.
The logic behind allocating a specific GPU memory would also be to prevent OOM memory during training sessions. For example, if one trains while opening video-memory consuming Chrome tabs/any other video consumption process, the tf.config.experimental.set_memory_growth(gpu, True) could result in OOM errors thrown, hence the necessity of allocating from the start more memory in certain cases.
The recommended and correct way in which to allot memory per GPU in TensorFlow 2.X is done in the following manner:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]
Shameless plug: If you install the GPU supported Tensorflow, the session will first allocate all GPUs whether you set it to use only CPU or GPU. I may add my tip that even you set the graph to use CPU only you should set the same configuration(as answered above:) ) to prevent the unwanted GPU occupation.
And in an interactive interface like IPython and Jupyter, you should also set that configure, otherwise, it will allocate all memory and leave almost none for others. This is sometimes hard to notice.
If you're using Tensorflow 2 try the following:
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
For Tensorflow 2.0 this this solution worked for me. (TF-GPU 2.0, Windows 10, GeForce RTX 2070)
physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
tf.config.experimental.set_memory_growth(physical_devices[0], True)
# allocate 60% of GPU memory
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.6
set_session(tf.Session(config=config))
this code has worked for me:
import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)
Well I am new to tensorflow, I have Geforce 740m or something GPU with 2GB ram, I was running mnist handwritten kind of example for a native language with training data containing of 38700 images and 4300 testing images and was trying to get precision , recall , F1 using following code as sklearn was not giving me precise reults. once i added this to my existing code i started getting GPU errors.
TP = tf.count_nonzero(predicted * actual)
TN = tf.count_nonzero((predicted - 1) * (actual - 1))
FP = tf.count_nonzero(predicted * (actual - 1))
FN = tf.count_nonzero((predicted - 1) * actual)
prec = TP / (TP + FP)
recall = TP / (TP + FN)
f1 = 2 * prec * recall / (prec + recall)
plus my model was heavy i guess, i was getting memory error after 147, 148 epochs, and then I thought why not create functions for the tasks so I dont know if it works this way in tensrorflow, but I thought if a local variable is used and when out of scope it may release memory and i defined the above elements for training and testing in modules, I was able to achieve 10000 epochs without any issues, I hope this will help..
i tried to train unet on voc data set but because of huge image size, memory finishes. i tried all the above tips, even tried with batch size==1, yet to no improvement. sometimes TensorFlow version also causes the memory issues. try by using
pip install tensorflow-gpu==1.8.0
Related
Puzzled by OOM Error on GPU when using 15595MiB / 16125MiB
I am using Tensorflow 2.X. My GPU memory is 16125MiB, but my model requires 15595MiB, according to nvidia-smi With this total usage, I get an OOM after some time, even when setting the minimum batch size. I also tried the following, but as soon as more memory is required, I still go out of memory: config = tf.compat.v1.ConfigProto() #config.gpu_options.per_process_gpu_memory_fraction = 0.8 config.gpu_options.allow_growth = True sess = tf.compat.v1.Session(config=config) The total memory value should also include the (150MiB) from firefox, which I need to use once in a while, but I doubt killing it would make much difference. My dataset is loaded through individual .h5 files (one per data sample), the intermediate computation arrays/tensors are deleted when not used. Unfortunately, I cannot resize the model for several reasons due to an ongoing publication. Is there any trick I can use to gain that extra 500MiB, to train my model safely, without incurring into OOM half-way?
Tensorflow: Is it normal that my GPU is using all its Memory but is not under full load?
I am currently trying to run a text-based sequence to sequence model using tensorflow 2.6 and CuDNN. The code is running, but taking suspiciously long. When I check my Task Manager, I see the following: This looks weird to me, because all memory is taking but it's not under heavy load. Is this expected behaviour? System: Windows 10 Python 3.9.9 Tensorflow & Keras 2.6 CUDA 11.6 CuDNN 8.3 NVidia RTX 3080ti In the code I found the following settings for the GPU def get_gpu_config(): gconfig = tf.compat.v1.ConfigProto() gconfig.gpu_options.per_process_gpu_memory_fraction = 0.975 # Don't take 100% of the memory gconfig.allow_soft_placement = True # Does not aggressively take all the GPU memory gconfig.gpu_options.allow_growth = True # Take more memory when necessary return gconfig My python output tells me it found my graphics card: And it is also visible in my nvidia-smi output: Am I maybe missing a configuration? The times it takes are similar to what I got on a CPU system, which seems off to me. Sidenote: The Code I try to run had to be migrated from tensorflow-gpu 1.12, but that went "relatively" smooth.
Yes this behaviour is normal for TensorFlow! From the TensorFlow docs By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation. To limit TensorFlow to a specific set of GPUs, use the tf.config.set_visible_devices method. If you don't want TensorFlow to allocate the totality of your VRAM, you can either set a hard limit on how much memory to use or tell TensorFlow to only allocate as much memory as needed. To set a hard limit Configure a virtual GPU device as follows: gpus = tf.config.list_physical_devices('GPU') if gpus: # Restrict TensorFlow to only allocate 1GB of memory on the first GPU try: tf.config.set_logical_device_configuration( gpus[0], [tf.config.LogicalDeviceConfiguration(memory_limit=1024)]) logical_gpus = tf.config.list_logical_devices('GPU') print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs") except RuntimeError as e: # Virtual devices must be set before GPUs have been initialized print(e) Only use as much as needed You can set the environment variable TF_FORCE_GPU_ALLOW_GROWTH=true OR Use tf.config.experimental.set_memory_growth as follows: gpus = tf.config.list_physical_devices('GPU') if gpus: try: # Currently, memory growth needs to be the same across GPUs for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) logical_gpus = tf.config.list_logical_devices('GPU') print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs") except RuntimeError as e: # Memory growth must be set before GPUs have been initialized print(e) All code and information here is taken from https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth
Tensorflow-GPU 2.4 VRAM issue
I am trying to run tensorflow-gpu version 2.4.0-dev20200828 (a tf-nightly build) for a convolutional neural network implementation. Some other details: The version of python is Python 3.8.5. Running Windows 10 Using an nVidia RTX 2080 which has 8 GB VRAM Cuda Version 11.1 The following snippet is what I run: import tensorflow as tf from tensorflow import keras gpus = tf.config.experimental.list_physical_devices('GPU') if gpus: try: tf.config.experimental.set_virtual_device_configuration( gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]) logical_gpus = tf.config.experimental.list_logical_devices('GPU') print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs") except RuntimeError as e: # Virtual devices must be set before GPUs have been initialized print(e) vgg_16 = keras.applications.VGG16(include_top=False, input_shape=(600, 600, 3)) random_image = np.random.rand(1, 600, 600, 3) output = vgg_16(random_image) The code for the memory configuration was taken from answers from here The issue I am having is that my GPU has 8GB of VRAM, and I need to be able to run the CNN with relatively large image batch sizes. The example is executed on a single image, but surprisingly I seem to only be able to increase the batch size to about 2-3 600 by 600 images. The code taken as per the comments says that it: Restrict TensorFlow to only allocate 1GB of memory on the first GPU, which is clearly not ideal. On the one hand if I allocate more, say 4000MB, I get errors such as: E tensorflow/stream_executor/cuda/cuda_dnn.cc:325] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED If I leave it as 1024 MB, I get messages like: Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.25GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. Any insights/resources on how to understand this issue much appreciated. I'd be willing to switch to another version of tensorflow/python/cuda if necessary, but ultimately I just want to have a deeper understanding of what this issue is.
A better way to control memory usage is by letting memory growth. You should remove all the above codes about gpus and use this instead: for gpu in tf.config.experimental.list_physical_devices('GPU'): tf.config.experimental.set_memory_growth(gpu, True) Additionally, you can resize or crop the input image to smaller size to further reduce memory usage.
same code run on different versions of tensorflow but different gpu memory is allocated
In my two system(P40, CUDA9, CUDNN7), tf1.8 and tf1.12 are installed respectively, and the same piece of code runs in tf1.12 almost double the allocated gpu memory as in tf1.8. I wrote the following code to simplify the comparison. At this time in tf1.8, 1241MiB gpu memory is allocated and in tf1.12, 737MiB gpu memory is allocated. How could I optimize the gpu memory allocation in tf? Any suggestion would be appreciated. import tensorflow as tf config = tf.ConfigProto() config.gpu_options.allow_growth = True config.allow_soft_placement = True a=tf.get_variable('a',(100,100)) b=tf.get_variable('b',(10000,10000)) sess=tf.Session(config=config) sess.run(tf.global_variables_initializer())
Tensorflow OOM error when allocating resources to GPU [duplicate]
I work in an environment in which computational resources are shared, i.e., we have a few server machines equipped with a few Nvidia Titan X GPUs each. For small to moderate size models, the 12 GB of the Titan X is usually enough for 2–3 people to run training concurrently on the same GPU. If the models are small enough that a single model does not take full advantage of all the computational units of the GPU, this can actually result in a speedup compared with running one training process after the other. Even in cases where the concurrent access to the GPU does slow down the individual training time, it is still nice to have the flexibility of having multiple users simultaneously train on the GPU. The problem with TensorFlow is that, by default, it allocates the full amount of available GPU memory when it is launched. Even for a small two-layer neural network, I see that all 12 GB of the GPU memory is used up. Is there a way to make TensorFlow only allocate, say, 4 GB of GPU memory, if one knows that this is enough for a given model?
You can set the fraction of GPU memory to be allocated when you construct a tf.Session by passing a tf.GPUOptions as part of the optional config argument: # Assume that you have 12GB of GPU memory and want to allocate ~4GB: gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) The per_process_gpu_memory_fraction acts as a hard upper bound on the amount of GPU memory that will be used by the process on each GPU on the same machine. Currently, this fraction is applied uniformly to all of the GPUs on the same machine; there is no way to set this on a per-GPU basis.
config = tf.ConfigProto() config.gpu_options.allow_growth=True sess = tf.Session(config=config) https://github.com/tensorflow/tensorflow/issues/1578
For TensorFlow 2.0 and 2.1 (docs): import tensorflow as tf tf.config.gpu.set_per_process_memory_growth(True) For TensorFlow 2.2+ (docs): import tensorflow as tf gpus = tf.config.experimental.list_physical_devices('GPU') for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) The docs also list some more methods: Set environment variable TF_FORCE_GPU_ALLOW_GROWTH to true. Use tf.config.experimental.set_virtual_device_configuration to set a hard limit on a Virtual GPU device.
Here is an excerpt from the Book Deep Learning with TensorFlow In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as it is needed by the process. TensorFlow provides two configuration options on the session to control this. The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations, it starts out allocating very little memory, and as sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process. 1) Allow growth: (more flexible) config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config, ...) The second method is per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. Note: No release of memory needed, it can even worsen memory fragmentation when done. 2) Allocate fixed memory: To only allocate 40% of the total memory of each GPU by: config = tf.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 0.4 session = tf.Session(config=config, ...) Note: That's only useful though if you truly want to bind the amount of GPU memory available on the TensorFlow process.
For Tensorflow version 2.0 and 2.1 use the following snippet: import tensorflow as tf gpu_devices = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(gpu_devices[0], True) For prior versions , following snippet used to work for me: import tensorflow as tf tf_config=tf.ConfigProto() tf_config.gpu_options.allow_growth=True sess = tf.Session(config=tf_config)
All the answers above assume execution with a sess.run() call, which is becoming the exception rather than the rule in recent versions of TensorFlow. When using the tf.Estimator framework (TensorFlow 1.4 and above) the way to pass the fraction along to the implicitly created MonitoredTrainingSession is, opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) conf = tf.ConfigProto(gpu_options=opts) trainingConfig = tf.estimator.RunConfig(session_config=conf, ...) tf.estimator.Estimator(model_fn=..., config=trainingConfig) Similarly in Eager mode (TensorFlow 1.5 and above), opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) conf = tf.ConfigProto(gpu_options=opts) tfe.enable_eager_execution(config=conf) Edit: 11-04-2018 As an example, if you are to use tf.contrib.gan.train, then you can use something similar to bellow: tf.contrib.gan.gan_train(........, config=conf)
You can use TF_FORCE_GPU_ALLOW_GROWTH=true in your environment variables. In tensorflow code: bool GPUBFCAllocator::GetAllowGrowthValue(const GPUOptions& gpu_options) { const char* force_allow_growth_string = std::getenv("TF_FORCE_GPU_ALLOW_GROWTH"); if (force_allow_growth_string == nullptr) { return gpu_options.allow_growth(); }
Tensorflow 2.0 Beta and (probably) beyond The API changed again. It can be now found in: tf.config.experimental.set_memory_growth( device, enable ) Aliases: tf.compat.v1.config.experimental.set_memory_growth tf.compat.v2.config.experimental.set_memory_growth References: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/config/experimental/set_memory_growth https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth See also: Tensorflow - Use a GPU: https://www.tensorflow.org/guide/gpu for Tensorflow 2.0 Alpha see: this answer
All the answers above refer to either setting the memory to a certain extent in TensorFlow 1.X versions or to allow memory growth in TensorFlow 2.X. The method tf.config.experimental.set_memory_growth indeed works for allowing dynamic growth during the allocation/preprocessing. Nevertheless one may like to allocate from the start a specific-upper limit GPU memory. The logic behind allocating a specific GPU memory would also be to prevent OOM memory during training sessions. For example, if one trains while opening video-memory consuming Chrome tabs/any other video consumption process, the tf.config.experimental.set_memory_growth(gpu, True) could result in OOM errors thrown, hence the necessity of allocating from the start more memory in certain cases. The recommended and correct way in which to allot memory per GPU in TensorFlow 2.X is done in the following manner: gpus = tf.config.experimental.list_physical_devices('GPU') if gpus: # Restrict TensorFlow to only allocate 1GB of memory on the first GPU try: tf.config.experimental.set_virtual_device_configuration( gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]
Shameless plug: If you install the GPU supported Tensorflow, the session will first allocate all GPUs whether you set it to use only CPU or GPU. I may add my tip that even you set the graph to use CPU only you should set the same configuration(as answered above:) ) to prevent the unwanted GPU occupation. And in an interactive interface like IPython and Jupyter, you should also set that configure, otherwise, it will allocate all memory and leave almost none for others. This is sometimes hard to notice.
If you're using Tensorflow 2 try the following: config = tf.compat.v1.ConfigProto() config.gpu_options.allow_growth = True session = tf.compat.v1.Session(config=config)
For Tensorflow 2.0 this this solution worked for me. (TF-GPU 2.0, Windows 10, GeForce RTX 2070) physical_devices = tf.config.experimental.list_physical_devices('GPU') assert len(physical_devices) > 0, "Not enough GPU hardware devices available" tf.config.experimental.set_memory_growth(physical_devices[0], True)
# allocate 60% of GPU memory from keras.backend.tensorflow_backend import set_session import tensorflow as tf config = tf.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 0.6 set_session(tf.Session(config=config))
this code has worked for me: import tensorflow as tf config = tf.compat.v1.ConfigProto() config.gpu_options.allow_growth = True session = tf.compat.v1.InteractiveSession(config=config)
Well I am new to tensorflow, I have Geforce 740m or something GPU with 2GB ram, I was running mnist handwritten kind of example for a native language with training data containing of 38700 images and 4300 testing images and was trying to get precision , recall , F1 using following code as sklearn was not giving me precise reults. once i added this to my existing code i started getting GPU errors. TP = tf.count_nonzero(predicted * actual) TN = tf.count_nonzero((predicted - 1) * (actual - 1)) FP = tf.count_nonzero(predicted * (actual - 1)) FN = tf.count_nonzero((predicted - 1) * actual) prec = TP / (TP + FP) recall = TP / (TP + FN) f1 = 2 * prec * recall / (prec + recall) plus my model was heavy i guess, i was getting memory error after 147, 148 epochs, and then I thought why not create functions for the tasks so I dont know if it works this way in tensrorflow, but I thought if a local variable is used and when out of scope it may release memory and i defined the above elements for training and testing in modules, I was able to achieve 10000 epochs without any issues, I hope this will help..
i tried to train unet on voc data set but because of huge image size, memory finishes. i tried all the above tips, even tried with batch size==1, yet to no improvement. sometimes TensorFlow version also causes the memory issues. try by using pip install tensorflow-gpu==1.8.0