Field: version.deployment_uri Error: The total size of files in gs://my-bucket/ml/ is x bytes, which exceeds the allowed maximum of 1073741824 bytes - tensorflow

When trying to create a new version in the google cloud console, I get an error like,
Field: version.deployment_uri Error: The total size of files in gs://my-bucket/ml/ is 2150116163 bytes, which exceeds the allowed maximum of 1073741824 bytes.
My model is an RNN model. I believe the embed sequence, vocab size, is likely the cause of the large model.
Is there a quota setting that can be adjusted for larger models?

Unfortunately, that limit is not adjustable at this time, although it may be in the future.
Are you comfortable sharing how large your model is? That information is valuable for us for planning purposes.
In the meantime, you will need to adjust the vocab and embedding sizes or otherwise reduce the size of the model.

Related

Cannot serialize protocol buffer of type tensorflow.GraphDef as the serialized size 3459900923bytes would be larger than the limit (2147483647 bytes)

We are attempting to train a network of knee MRI through Niftynet. We have a spatial window_size = (400,400,400) with pixdim = (0.4,0.4,0.4). When we run these images with a lower window size (for example 160,160,160) - there is no problem and it works quite well, however when we increase the window_size to achieve higher resolution outputs we get an error: Cannot serialize protocol buffer of type tensorflow.GraphDef as the serialized size (3459900923 bytes) would be larger than the limit (2147483647 bytes).
This is due to a limit in protobuf and because Niftynet / Tensorflow have decided it should be int32 which gives maxvalue (2 ^ 32) / 2 = 2147483648. At the same time I have heard that protobuf should really be able to cope with uint64, which then will be able to handle a much larger number? Do you know if this can be manipulated in Tensorflow/Niftynet?

What does "max_batch_size" mean in tensorflow-serving batching_config.txt?

I'm using tensorflow-serving on GPUs with --enable-batching=true.
However, I'm a little confused with max_batch_size in batching_config.txt.
My client sends a input tensor with a tensor shape [-1, 1000] in a single gRPC request, dim0 ranges from (0, 200]. I set max_batch_size = 100 and receive an error:
"gRPC call return code: 3:Task size 158 is larger than maximum batch
size 100"
"gRPC call return code: 3:Task size 162 is larger than maximum batch
size 100"
Looks like max_batch_size limits dim0 of a single request, but tensorflow batches multiple requests to a batch, I thought it means the sum of request numbers.
Here is a direct description from the docs.
max_batch_size: The maximum size of any batch. This parameter governs
the throughput/latency tradeoff, and also avoids having batches that
are so large they exceed some resource constraint (e.g. GPU memory to
hold a batch's data).
In ML most of the time the first dimension represents a batch. So based on my understanding tensorflow serving confuses the value for the first dimension as a batch and issues errors whenever it is bigger than the allowed value. You can verify it by issuing some of the request where you manually control the first dimension to be lower than 100. I expect this to remove the error.
After that you can modify your inputs to be sent in a proper format.

how to make embedding column through features directly?

I'm learning wide&deep model for ctr. My data has a feature user_id which has more than 2**26 values. How I can get embedding column through this feature? I used
user_id = tf.feature_column.categorical_column_with_hash_bucket('user_id', hash_bucket_size=2**26),
user_id_emb = tf.feature_column.embedding_column(user_id, dimension=95),
but it shows out of memeory.
So, 2**26 is about 64M. You want 95 embedding dimensions. Each will be a float32 by default. That is 4 bytes. 4 * 95 ~= 400 bytes per user_id. So you need 64M * 400 ~= 25.6 Gbytes of memory to store the embedding.
Make sure you can allocate that much on your system. It should be all in ram (swap will make everything much slower). If you placed this on a GPU it won't work since most GPUs don't have so much memory available. An embedding of only 20 dimensions should use about 5Gbytes which is more likely to fit in memory.
The easiest thing is to lower the number of embedding dimensions.
If you have multiple systems available you can shard the embedding (see partitioner parameter for variable related functions).
Another thing you can do is cluster some user_ids together (lower the hash_bucket_size). Or replace user_ids by a combination of other features that would describe the user sufficiently for your model.

Avoiding exhausting GPU resources in convNN Tensorflow

I'm trying to run a hyperparameter optimization script, for a convNN using Tensorflow.
As you may know, TF handling of the GPU-Memory isn't that fancy(don't think it will ever be, thanks to the TPU). So my question is how do I know to choose the filter dimensions and the batchsize, so that the GPU-memory don't get exhausted.
Here's the equation that I'm thinking of:
image_shape =128x128x3(3 color channel)
batchSitze = 20 ( is the smallest possible batchsize, since I got 20 klasses)
filter_shape= fw_fh_fd[filter_width=4, filter_height=4, filter_depth=32]
As far as understood, using tf.conv2d function will need the following amount of memory:
image_width * image_height *numerofchannel*batchSize*filter_height*filter_width*filter_depth*32bit
since we're tf.float32 type for each pixel.
in the given example, the needed memory, will be :
128x128x3x20x4x4x32x32 =16106127360 (bits), which is all most 16GB of memory.
I'm not the formula is correct, so I hope to get a validation or the a correction of what I'm missing.
Actually, this will take only about 44MB of memory, mostly taken by the output.
Your input is 20x128x128x3
The convolution kernel is 4x4x3x32
The output is 20x128x128x32
When you sum up the total, you get
(20*128*128*3 + 4*4*3*32 + 20*128*128*32) * 4 / 1024**2 ≈ 44MB
(In the above, 4 is for the size in bytes of float32 and 1024**2 is to get the result in MB).
Your batch size can be smaller than your number of classes. Think about ImageNet and its 1000 classes: people are training with batch sizes 10 times smaller.
EDIT
Here is a tensorboard screenshot of the net — it reports 40MB rather than 44MB, probably because it excludes the input — and you also have all the tensor sizes I mentioned earlier.

Size in MemoryRequirements not what I'm expecting

I'm creating a texture, querying the memory requirements, and it's not what I was expecting. Here's the ImageCreateInfo structure:
ImageCreateInfo()
.X2D(1024, 1024)
.Format(Format::R8G8B8_UNORM)
.InitialLayout(ImageLayout::PREINITIALIZED)
.Tiling(ImageTiling::LINEAR)
.Usage(ImageUsageFlagBits::TRANSFER_SRC);
Now, I was expecting one byte for each of R,G,B, at width and height of 1024 to give memory requirements of 3 * 1024 * 1024 = 3,145,728. But instead, it returns 1,048,576, which is perfectly 1024 * 1024. It seems to not care about the one byte for each channel of RGB. What am I missing here?
You're right in that this should return 3,145,728 bytes, but is the R8G8B8_UNORM format actually available on your implementation? If not, you won't get a correct allocation size because you actually are not going to be able to use that image anyway.
If you enable validation layers this should throw an error from the image validation layers btw.
At least on the GPU I'm right now it's not supported for any of the tiling modes or as a buffer format. But e.g. R8G8B8A8 or R8G8 are available and return the correct allocation size.
If R8G8B8 is actually available on your GPU could you post your complete VkImageCreateInfo structure, including number of mips and layers?
So a good idea would be to check if the image format you request (and want to allocate for) is actually supported for your use case (linear, optimal, buffer).