What does pic_width_in_luma_samples mean in H265 format?
User manual say
pic_width_in_luma_samples specifies the width of each decoded picture in units of luma samples.
pic_width_in_luma_samples shall not be equal to 0 and shall be an integer multiple of MinCbSizeY.
but what are luma samples? how to get pixel width from luma samples?
Referring to HEVC guide: to save bandwidth in HEVC, width is expressed in multiples of 16; so:
videoWidth = ( pic_width_in_luma_sample + 1) * 16;
Related
im recently using tensorflow api object detection. The default SSD-MobileNet v1 is using 300 x 300 images as input training image, but i gonna edit the image size as width and height in different values. For instance, 320 * 180. Are aspects ratio in .config represent the real ratio of the anchors width/height ratio or they are just for the square images?
You can change the "size" to any different value , the general guidance is preserve the aspect ratio of the original image while the size can be different value.
Aspect ratios represent the real ratio of anchors. You can use it for different input ratios, but you will get the best result if you use input ratio similar to square images.
I'm trying to use NVIDIA NPP to experiment with some image resizing routines. I want to resize to an exact dimension. I've been looking at image resizing using NVIDIA NPP but all of its resize functions take scale factors for X and Y Dimensions, and I could not see any API taking direct destination dimensions.
As an example, this is one API:
NppStatus nppiResizeSqrPixel_8u_C1R(const Npp8u * pSrc, NppiSize oSrcSize, int nSrcStep, NppiRect oSrcROI, Npp8u * pDst, int nDstStep, NppiRect oDstROI, double nXFactor, double nYFactor, double nXShift, double nYShift, int eInterpolation);
I realize one way could be to find the appropriate scale factor the destination dimension, but we don't exactly know how the API decides destination ROI based on scale factor (since it is floating point math). We could reverse the calculation in the jpegNPP sample to find the scale factor, but the API itself does not make any guarantees so I'm not sure how safe it is. Any ideas what are the possibilities?
As a side question, the API also takes two params, nXShift and nYShift, but just says "Source pixel shift in x-direction". I'm not exactly clear what shift is being talked about here. Do you have an idea?
If I wanted to map the whole SRC image to the smaller rectangle in the DST image as shown in the image below I would use xFactor = yFactor = 0.5 and xShift = 0.5*DST.width and yShift = 0.
Mapping src to half size destination image
In other words, the pixel at (x,y) in the SRC is mapped to the pixel (x',y') in the DST as
x' = xFactor * x + xShift
y' = yFactor * y + yShift
In this case, both the source and dest ROI could be the entire support of the respective images.
I've started editing the RaspiStillYUV.c code. I eventually want to process the image I receive, but for now, I'm just working to understand it. Why am I working with YUV instead of RGB? So I can learn something new. I've made minor changes to the function camera_buffer_callback. All I am doing is the following:
fprintf(stderr, "GREAT SUCCESS! %d\n", buffer->length);
The line this is replacing:
bytes_written = fwrite(buffer->data, 1, buffer->length, pData->file_handle);
Now, the dimensions should be 2592 x 1944 (w x h) as set in the code. Working off of Wikipedia (YUV420) I have come to the conclusion that the file size should be w * h * 1.5. Since the Y component has 1 byte of data for each pixel and the U and V components have 1 byte of data for every 4 pixels (1 + 1/4 + 1/4 = 1.5). Great. Doing the math in Python:
>>> 2592 * 1944 * 1.5
7558272.0
Unfortunately, this does not line up with the output of my program:
GREAT SUCCESS! 7589376
That leaves a difference of 31104 bytes.
I figure that the buffer is allocated in fixed size chunks (the output size is evenly divisible by 512). While I would like to understand that mystery, I'm fine with the fixed size chunk explanation.
My question is if I am missing something. Are the extra bytes beyond the expected size meaningful in this format? Should they be ignored? Are my calculations off?
The documentation at this location supports your theory on padding: http://www.raspberrypi.org/wp-content/uploads/2013/07/RaspiCam-Documentation.pdf
Specifically:
Note that the image buffers saved in raspistillyuv are padded to a
horizontal size divisible by 16 (so there may be unused bytes at the
end of each line to made the width divisible by 16). Buffers are also
padded vertically to be divisible by 16, and in the YUV mode, each
plane of Y,U,V is padded in this way.
So my interpretation of this is the following.
The width is 2592 (divisible by 16 so this is ok).
The height is 1944 which is 8 short of being divisible by 16 so an extra 8*2592 are added (also multiplied by 1.5) thus giving your 31104 extra bytes.
Although this kindof helps with the size of the file, it doesn't explain the structure of the YUV output properly. I am having a look at this description to see if this provides a hint to start with: http://en.wikipedia.org/wiki/YUV#Y.27UV420p_.28and_Y.27V12_or_YV12.29_to_RGB888_conversion
From this I believe it is as follows:
Y Channel:
2592 * (1944+8) = 5059584
U Channel:
1296 * (972+4) = 1264896
V Channel:
1296 * (972+4) = 1264896
Giving a sum of :
5059584 + 2*1264896 = 7589376
This makes the numbers add up so only thing left is to confirm if this interpretation is correct.
I am also trying to do the YUV decode (for image comparisons) so if you can confirm if this actually does correspond to what you are reading in the YUV file this would be much appreciated.
You have to read the manual carefully. Buffers are padded to multiples of 16, but colour data is half-size, so your image size needs to be in multiples of 32 to avoid problems with padding breaking external software.
I have a user's animated gif file that is about 10mb. I'd like to allow users to upload it and let me host it on my server, but I'd like to rescale it to fit a maximum file size of 5mb to conserve bandwidth from hotlinking.
I have a basic method right now that determines a targetWidth and targetHeight based on pixel surface area.
It works well enough:
CGFloat aspectRatio = originalHeight / originalWidth;
CGFloat reductionFactor = desiredFileSize / originalFileSize;
CGFloat targetSurfaceArea = originalSurfaceArea * reductionFactor;
int targetHeight = targetSurfaceArea / sqrt(targetSurfaceArea/aspectRatio);
int targetWidth = targetSurfaceArea / targetHeight;
Its fairly accurate, ex. results: a 27mb file will turn into 3.3mb, or a 13.9mb will turn into 5.5mb.
I would like to tune this accuracy to get much closer to 5mb, and I was hoping someone would know a bit more about how gif color / frame count could better be factored into this algorithm. Thanks
Not sure you're going to find an easy way to do this. Projecting the compressed size of a file without running the compression algorithm seems to me to be non deterministic.
However, if you have plenty of compute cycles you could use an approximation based approach. Use the algorithm above to give you a first resize of the image. If the resulting file is > than 5Mb, half the resize percentage and try again. If < 5Mb add 50% to the resize percentage and try again. Repeat until you get sufficiently close to 5Mb.
So, for example
50% = 3.3Mb, so try halfway between 50 and 100
75% = 6.1Mb, so try halfway between 75 and 50
62.5% = 4.7Mb so try halfway between 62.5 and 75
etc
I want to know how much data can be embedded into an image of different sizes.
For example in 30kb image file how much data can be stored without distortion of the image.
it depends on the image type , algoridum , if i take a example as a 24bitmap image to store ASCII character
To store a one ASCII Character = Number of Pixels / 8 (one ASCII = 8bits )
It depends on two points:
How much bits per pixel in your image.
How much bits you will embed in one pixel .
O.K lets suppose that your color model is RGB and each pixel = 8*3 bits (one byte for each color), and you want embed 3 bits in one pixel.
data that can be embedded into an image = (number of pixels * 3) bits
If you would use the LSB to hide your information this would give 30000Bits of available space to use. 3750 bytes.
As the LSB represents 1 or 0 into a byte that gets values from 0-256 this gives you in the worst case scenario that you are going to modify all the LSBs distortion of 1/256 that equals 0,4%.
In the statistical average scenario you would get 0,2% distortion.
So depends on which bit of the byte you are going to change.