Cannot serialize protocol buffer of type tensorflow.GraphDef as the serialized size 3459900923bytes would be larger than the limit (2147483647 bytes) - tensorflow

We are attempting to train a network of knee MRI through Niftynet. We have a spatial window_size = (400,400,400) with pixdim = (0.4,0.4,0.4). When we run these images with a lower window size (for example 160,160,160) - there is no problem and it works quite well, however when we increase the window_size to achieve higher resolution outputs we get an error: Cannot serialize protocol buffer of type tensorflow.GraphDef as the serialized size (3459900923 bytes) would be larger than the limit (2147483647 bytes).
This is due to a limit in protobuf and because Niftynet / Tensorflow have decided it should be int32 which gives maxvalue (2 ^ 32) / 2 = 2147483648. At the same time I have heard that protobuf should really be able to cope with uint64, which then will be able to handle a much larger number? Do you know if this can be manipulated in Tensorflow/Niftynet?

Related

What does "max_batch_size" mean in tensorflow-serving batching_config.txt?

I'm using tensorflow-serving on GPUs with --enable-batching=true.
However, I'm a little confused with max_batch_size in batching_config.txt.
My client sends a input tensor with a tensor shape [-1, 1000] in a single gRPC request, dim0 ranges from (0, 200]. I set max_batch_size = 100 and receive an error:
"gRPC call return code: 3:Task size 158 is larger than maximum batch
size 100"
"gRPC call return code: 3:Task size 162 is larger than maximum batch
size 100"
Looks like max_batch_size limits dim0 of a single request, but tensorflow batches multiple requests to a batch, I thought it means the sum of request numbers.
Here is a direct description from the docs.
max_batch_size: The maximum size of any batch. This parameter governs
the throughput/latency tradeoff, and also avoids having batches that
are so large they exceed some resource constraint (e.g. GPU memory to
hold a batch's data).
In ML most of the time the first dimension represents a batch. So based on my understanding tensorflow serving confuses the value for the first dimension as a batch and issues errors whenever it is bigger than the allowed value. You can verify it by issuing some of the request where you manually control the first dimension to be lower than 100. I expect this to remove the error.
After that you can modify your inputs to be sent in a proper format.

TF Api Dataset: initialization

The tf.dataset works really greate, I was able to speed up learning ~2x. But I have still performance problem, the utilization of GPU is low (despite using tf.dataset with several workers).
My use case is following:
~400 of training examples, each have 10 input channels (take ~5GB)
The task is segmentation using ResNet50. The forward-backward take ~0.15s. Batch size = 32
The data loading is fast, take ~0.06s.
But after one epoch (400/32 ~= 13 iteration), the data loading take ~3.5 seconds, same like initialization of loader (it is more than processing all epoch). This make learning very slow.
My question is: is there are option to eliminate initialization after each epoch, just continuously feed the data ?
I was trying to set dataset.repeat(10) but it does no help.
The loading code and train is here: https://gist.github.com/melgor/0e681a4fe8f125d25573aa30d8ace5f3
The model is just ResNet transformed to Ecnoder-Decoder idea for image segmentation. The most of the code is taken from https://github.com/argman/EAST, but as here loading is very slow, I would like to transform it to TfRecords.
I partly resolve my problem with long initialization. I just make tge tfrecord file smaller.
In my base implementation I used raw string as images (so string from numpy array). The new 'tfrecord' contain compressed images using jpeg or png. Thanks to that it make the file 50x smaller what make initialization much faster. But there is also the cons of it: your images need to be uini8 (jpeg) or uint16 (png). In case of float, you can use uint16 but there will loss of information.
For encoding numpy array to compressed sting you can use Tensorflow itself:
encoded_jpeg = tf.image.encode_jpeg(tf.constant(img),format='rgb').eval(session=sess)
encoded_png = tf.image.encode_png(tf.constant(png_image)).eval(session=sess)

Avoiding exhausting GPU resources in convNN Tensorflow

I'm trying to run a hyperparameter optimization script, for a convNN using Tensorflow.
As you may know, TF handling of the GPU-Memory isn't that fancy(don't think it will ever be, thanks to the TPU). So my question is how do I know to choose the filter dimensions and the batchsize, so that the GPU-memory don't get exhausted.
Here's the equation that I'm thinking of:
image_shape =128x128x3(3 color channel)
batchSitze = 20 ( is the smallest possible batchsize, since I got 20 klasses)
filter_shape= fw_fh_fd[filter_width=4, filter_height=4, filter_depth=32]
As far as understood, using tf.conv2d function will need the following amount of memory:
image_width * image_height *numerofchannel*batchSize*filter_height*filter_width*filter_depth*32bit
since we're tf.float32 type for each pixel.
in the given example, the needed memory, will be :
128x128x3x20x4x4x32x32 =16106127360 (bits), which is all most 16GB of memory.
I'm not the formula is correct, so I hope to get a validation or the a correction of what I'm missing.
Actually, this will take only about 44MB of memory, mostly taken by the output.
Your input is 20x128x128x3
The convolution kernel is 4x4x3x32
The output is 20x128x128x32
When you sum up the total, you get
(20*128*128*3 + 4*4*3*32 + 20*128*128*32) * 4 / 1024**2 ≈ 44MB
(In the above, 4 is for the size in bytes of float32 and 1024**2 is to get the result in MB).
Your batch size can be smaller than your number of classes. Think about ImageNet and its 1000 classes: people are training with batch sizes 10 times smaller.
EDIT
Here is a tensorboard screenshot of the net — it reports 40MB rather than 44MB, probably because it excludes the input — and you also have all the tensor sizes I mentioned earlier.

Field: version.deployment_uri Error: The total size of files in gs://my-bucket/ml/ is x bytes, which exceeds the allowed maximum of 1073741824 bytes

When trying to create a new version in the google cloud console, I get an error like,
Field: version.deployment_uri Error: The total size of files in gs://my-bucket/ml/ is 2150116163 bytes, which exceeds the allowed maximum of 1073741824 bytes.
My model is an RNN model. I believe the embed sequence, vocab size, is likely the cause of the large model.
Is there a quota setting that can be adjusted for larger models?
Unfortunately, that limit is not adjustable at this time, although it may be in the future.
Are you comfortable sharing how large your model is? That information is valuable for us for planning purposes.
In the meantime, you will need to adjust the vocab and embedding sizes or otherwise reduce the size of the model.

Size in MemoryRequirements not what I'm expecting

I'm creating a texture, querying the memory requirements, and it's not what I was expecting. Here's the ImageCreateInfo structure:
ImageCreateInfo()
.X2D(1024, 1024)
.Format(Format::R8G8B8_UNORM)
.InitialLayout(ImageLayout::PREINITIALIZED)
.Tiling(ImageTiling::LINEAR)
.Usage(ImageUsageFlagBits::TRANSFER_SRC);
Now, I was expecting one byte for each of R,G,B, at width and height of 1024 to give memory requirements of 3 * 1024 * 1024 = 3,145,728. But instead, it returns 1,048,576, which is perfectly 1024 * 1024. It seems to not care about the one byte for each channel of RGB. What am I missing here?
You're right in that this should return 3,145,728 bytes, but is the R8G8B8_UNORM format actually available on your implementation? If not, you won't get a correct allocation size because you actually are not going to be able to use that image anyway.
If you enable validation layers this should throw an error from the image validation layers btw.
At least on the GPU I'm right now it's not supported for any of the tiling modes or as a buffer format. But e.g. R8G8B8A8 or R8G8 are available and return the correct allocation size.
If R8G8B8 is actually available on your GPU could you post your complete VkImageCreateInfo structure, including number of mips and layers?
So a good idea would be to check if the image format you request (and want to allocate for) is actually supported for your use case (linear, optimal, buffer).