Size in MemoryRequirements not what I'm expecting - vulkan

I'm creating a texture, querying the memory requirements, and it's not what I was expecting. Here's the ImageCreateInfo structure:
ImageCreateInfo()
.X2D(1024, 1024)
.Format(Format::R8G8B8_UNORM)
.InitialLayout(ImageLayout::PREINITIALIZED)
.Tiling(ImageTiling::LINEAR)
.Usage(ImageUsageFlagBits::TRANSFER_SRC);
Now, I was expecting one byte for each of R,G,B, at width and height of 1024 to give memory requirements of 3 * 1024 * 1024 = 3,145,728. But instead, it returns 1,048,576, which is perfectly 1024 * 1024. It seems to not care about the one byte for each channel of RGB. What am I missing here?

You're right in that this should return 3,145,728 bytes, but is the R8G8B8_UNORM format actually available on your implementation? If not, you won't get a correct allocation size because you actually are not going to be able to use that image anyway.
If you enable validation layers this should throw an error from the image validation layers btw.
At least on the GPU I'm right now it's not supported for any of the tiling modes or as a buffer format. But e.g. R8G8B8A8 or R8G8 are available and return the correct allocation size.
If R8G8B8 is actually available on your GPU could you post your complete VkImageCreateInfo structure, including number of mips and layers?
So a good idea would be to check if the image format you request (and want to allocate for) is actually supported for your use case (linear, optimal, buffer).

Related

How to convert BGR TensorFlow Lite model to RGB?

I have a tflite model trained on BGR data. How can I make it work properly with RGB images?
UPDATE
I want to use it with the material-showcase app: https://github.com/googlesamples/mlkit/tree/master/android/material-showcase
#Farmaker #JaredJunyoungLim . Thank you very much for your answers. I've updated the question. At first I was thinking about converting the model itself, so it wouldn't require any changes in the code. For example, the converter to the OpenVINO format has an option to reverse input channels. I have also tried to set the BGR ColorSpace in the metadata, but have found out that it's most probably not possible.
I guess I'll go with your suggestion then. In the linked code, there is indeed the ByteBuffer (FrameProcessorBase.kt). I guess this is the place to change the order of the channels (after the line 70):
val frame = processingFrame ?: return
However, how can I change the order of channels, if this is just a ByteBuffer? Do I need to figure out the way data is stored in it? For example there is R,G,B,R,G,B,R,G,B,... etc. for every pixel? Or maybe there is some more elegant way to that?
I can see that the format is set to IMAGE_FORMAT_NV21, which is YCrCb
UPDATE 2
For what I've tested (
Log.d("ByteBuffer", frame.toString())
), it seems that the ByteBuffer takes 1.5 bytes per pixel:
java.nio.HeapByteBuffer[pos=0 lim=3110401 cap=3110401]
(Resolution: 1920x1080; 3110400/1920/1080=1.5)
So it uses 12 bits per pixel, which means 4 bits per channel per pixel. That's a bit strange, because I would suspect at least 8 bits per channel per pixel (0-255).
So I guess that maybe it's compressed.

Homomorphic encryption using Palisade library

To all homomorphic encryption experts out there:
I'm using the PALISADE library:
int plaintextModulus = 65537;
float sigma = 3.2;
SecurityLevel securityLevel = HEStd_128_classic;
uint32_t depth = 2;
//Instantiate the crypto context
CryptoContext<DCRTPoly> cc = CryptoContextFactory<DCRTPoly>::genCryptoContextBFVrns(
plaintextModulus, securityLevel, sigma, 0, depth, 0, OPTIMIZED);
could you please explain (all) the parameters especially intrested in ptm, depth and sigma.
Secondly I am trying to make a Packed Plaintext with the cc above.
cc->MakePackedPlaintext(array);
What is the maximum size of the array? On my local machine (8GB RAM) when the array is larger than ~8000 int64 I get an free(): invalid next size (normal) error
Thank you for asking the question.
Plaintext modulus t (denoted as t here) is a critical parameter for BFV as all operations are performed mod t. In other words, when you choose t, you have to make sure that all computations do not wrap around, i.e., do not exceed t. Otherwise you will get an incorrect answer unless your goal is to compute something mod t.
sigma is the distribution parameter (used for the underlying Learning with Errors problem). You can just set to 3.2. No need to change it.
Depth is the multiplicative depth of the circuit you are trying to compute. It has nothing to with the size of vectors. Basically, if you have AxBxCxD, you have a depth 3 with a naive approach. BFV also supports more efficient binary tree evaluation, i.e., (AxB)x(CxD) - this option will reduce the depth to 2.
BFV is a scheme that supports packing. By default, the size of packed ciphertext is equal to the ring dimension (something like 8192 for the example you mentioned). This means you can pack up to 8192 integers in your case. To support larger arrays/vectors, you would need to break them into batches of 8192 each and encrypt each one separately.
Regarding your application, the CKKS scheme would probably be a much better option (I will respond on the application in more detail in the other thread).
I have some experience with the SEAL library which also uses the BFV encryption scheme. The BFV scheme uses modular arithmetic and is able to encrypt integers (not real numbers).
For the parameters you're asking about:
The Plaintext Modulus is an upper bound for the input integers. If this parameter is too low, it might cause your integers to overflow (depending on how large they are of course)
The Sigma is the distribution parameter for Gaussian noise generation
The Depth is the circuit depth which is the maximum number of multiplications on a path
Also for the Packed Plaintext, you should use vectors not arrays. Maybe that will fix your problem. If not, try lowering the size and make several vectors if necessary.
You can determine the ring dimension (generated by the crypto context based on your parameter settings) by using cc->GetRingDimension() as shown in line 113 of https://gitlab.com/palisade/palisade-development/blob/master/src/pke/examples/simple-real-numbers.cpp

Array fits comfortably within available RAM, but a memory error still occurs when calling numpy.take on it

I have three arrays of shape (1029,1146,8,5). They are H4, rowOffsets, and colOffsets. H4 is float32 while the other two are int. Assuming 4 bytes per element array, H4 has a cost of 188.7 MB.
My machine has 32 GB RAM total, with 18 currently available. I used platform.architecture() to verify that the Python interpreter is 64 bit, so that RAM ought to be available.
It seems like I'm nowhere near the memory limit, yet I get a memory error when I run the following:
shifted=np.take(H4,rowOffsets,0,mode='clip').
I further tested this by running the code up to the Take call with a much larger input of (3000,3000,8,5). This consumed 7 times more memory yet also did not cause a memory error until the Take call.
So I figure I'm using Take wrong, there's a bug with it, or it consumes a massive amount of memory while executing. Can anyone help clarify what's happening here?
With multi-dimensional arguments take takes a full slice of all but the axis dimension for each entry in indices. Thus the way you use it the result would be 1029 * 1146**2 * 8**2 * 5**2 * itemsize which is a lot and explains your memory problems.
You probably want to use take_along_axis instead.

Why are large numpy arrays 64-byte aligned but not smaller ones

The following code:
prev=[]
addresses=[]
for i in range(10000):
a = np.ones(x).astype(np.float32)
prev.append(a)
address = a.__array_interface__['data'][0]
assert(address % 64 == 0)
assert((address not in addresses))
addresses.append(address)
Will not raise an assertionError for values of x > 252 suggesting that arrays bigger than 253, (or bigger than 505 when using float16) are aligned differently to smaller arrays. What is the reason for this?
I am on a OSX (Intel(R) Core(TM) i7-6920HQ CPU # 2.90GHz) running numpy 1.12.1
Your test loop isn't accomplishing exactly what you expect. Since only one array exists in memory at a time, it's quite possible - indeed LIKELY - that new ones will be allocated at the same memory address as the one just freed. You'd have to do something like append the arrays to a list (thus making them all exist in memory simultaneously) to actually test 10000 distinct allocations.
However, I can easily believe that you're seeing a real effect, as it's perfectly reasonable for a memory allocator to use different strategies based on the size of the block being allocated. For example, at some point the allocator may stop trying to use memory it already has, and start requesting entire memory pages directly from the operating system. Once that threshold is reached, you'd find that everything is aligned on a much higher power-of-2 boundary than 64 - perhaps 4096. You seem to be hitting some intermediate threshold at 1024 bytes (including overhead), it might be interesting to test for 128/256/512/1024 byte alignment.
Here is my guess: Using aligned memory typically involves allocating a larger block, and then releasing the upfront bytes that are allocated before the alignment boundary.
This is insignificant for large arrays, but for small arrays the fragmentation and overhead introduced likely outweights the benefits.

How to get exact size of image in bytes?

I have calculated image size in bytes by converting image into NSData and its data length got wrong value.
NSData *data = UIImageJPEGRepresentation(image,0.5);
NSLog(#"image size in bytes %lu",(unsigned long)data.length);
Actually, the UIImage.length function here is not returning the wrong value, its just the result of the lossy conversion/reversion from a UIImage to NSData.
Setting the compressionQuality to the lowest compression possible of 1.0 in UIImageJpegRepresentation will not return the original image. Although the image metadata is stripped in this process, the function can and usually will yield an object larger than the original. Note that this increase in filesize does NOT increase the quality of the image from the compressed original either. Jpegs are highly compressed to begin with, which is why they are used so often, and the function is uncompressing it and then recompressing it. Its kind of like getting botox after age has stretched your body out, it might look similar to the original, but the insides are just not as as good as they used to be.
You could use a lower compressionQuality conditionally on larger files, close to 1.0, as the quality will drop off quickly. Other than that, depending on the final purpose of your images, the only other option would be to resize the image or adjust its resolution, perhaps in addition to adjusting the compression ratio. This change will exponentially curtail data usage. Web and mobile usage typically don't need the resolution as something like images meant for digital print.
You can write some code that adjusts each image and NSData representation only as much as needed to fit its individual data constraint.