Vulkan swapchain image extent capabilities - vulkan

I am doing query on swapchain capabilities where I am checking on currentExtent,minImageExtent and maxImageExtent properties of VkSurfaceCapabilitiesKHR.
For window size of 128x128 I am getting:
currentExtent = 148x128
minImageExtent = 148x128
maxImageExtent = 148x128
But for window size 256x256 I am getting:
currentExtent = 256x256
minImageExtent = 256x256
maxImageExtent = 256x256
For 1280x720:
currentExtent = 1280x720
minImageExtent = 1280x720
maxImageExtent = 1280x720
I have two questions:
Why for 128x128 the width is not the same value?
Why current,min,max for the rest of dimension are the same?
My hardware: NVIDIA RTX 3000, Driver version 431.86, Windows 10

Q1: Feels like a bug (yours, or driver).
Q2: Because it works like that on some platforms. See the specification, e.g.:
With Win32, minImageExtent, maxImageExtent, and currentExtent must always equal the window size.


GluonCV - Object detection, set mx.ctx to GPU, but still using all CPU cores

I’m running an object detection routine on a server.
I set the context to the GPU, and I'm loading the model, the parameters and the data on the GPU. The program is reading from a video file or from a rtsp stream, using OpenCV.
When using nvidia-smi, I see that the selected GPU usage is at 20%, which is reasonable. However, the object detection routine is still using 750-1200 % of the CPU (basically, all of the available cores of the server).
This is the code:
def main():
ctx = mx.gpu(3)
# -------------------------
# Load a pretrained model
# -------------------------
net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_coco', pretrained=True)
# Load the webcam handler
cap = cv2.VideoCapture("video/video_01.mp4")
count_frame = 0
print(f"Frame: {count_frame}")
# Load frame from the camera
ret, frame =
if (cv2.waitKey(25) & 0xFF == ord('q')) or (ret == False):
# Image pre-processing
frame = mx.nd.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)).astype('uint8')
frame_nd, frame_np =, short=512, max_size=700)
if isinstance(frame_nd, mx.ndarray.ndarray.NDArray):
# Run frame through network
frame_nd = frame_nd.as_in_context(ctx)
class_IDs, scores, bounding_boxes = net(frame_nd)
if isinstance(class_IDs, mx.ndarray.ndarray.NDArray):
if isinstance(scores, mx.ndarray.ndarray.NDArray):
if isinstance(bounding_boxes, mx.ndarray.ndarray.NDArray):
count_frame += 1
This is the output of nvidia-smi:
while this is the output of top:
The pre-processing operations are running on the CPU:
frame = mx.nd.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)).astype('uint8')
frame_nd, frame_np =, short=512, max_size=700)
but is it enough to justify such a high CPU usage? In case, can I run them on GPU as well?
EDIT: I modified and copied the whole code, in response to Olivier_Cruchant's comment (thanks!)
Your CPU is likely busy because of the pre-processing load and frequent back-and-forth from memory to GPU because inference seems to be running frame-by-frame
I would suggest to try the following:
Run a batched inference (send a batch of N frames to the network) to
increase GPU usage and reduce communication
Try using NVIDIA DALI to
better use GPU for data ingestion and pre-processing (DALI MXNet reference, DALI mp4 ingestion pytorch example)

Pointcloud and RGB Image alignment on RealSense ROS

I am working on a dog detection system using deep learning (Tensorflow object detection) and Real Sense D425 camera. I am using the Intel(R) RealSense(TM) ROS Wrapper in order to get images from the camera.
I am executing "roslaunch rs_rgbd.launch" and my Python code is subscribed to "/camera/color/image_raw" topic in order to get the RGB image. Using this image and object detection library, I am able to infer (20 fps) the location of a dog in a image as a box level (xmin,xmax,ymin,ymax)
I will like to crop the PointCloud information with the object detection information (xmin,xmax,ymin,ymax)
and determine if the dog is far away or near the camera. I will like to use the aligned information pixel by pixel between the RGB image and the pointcloud.
How can I do it? Is there any topic for that?
Thanks in advance
Intel publishes their python notebook about the same problem at:
What they do is as follow :
get color frame and depth frame (point cloud in your case)
align the depth to color
use ssd to detect the dog inside color frame
Get the average depth for detected dog and convert to meter
depth = np.asanyarray(aligned_depth_frame.get_data())
# Crop depth data:
depth = depth[xmin_depth:xmax_depth,ymin_depth:ymax_depth].astype(float)
# Get data scale from the device and convert to meters
depth_scale = profile.get_device().first_depth_sensor().get_depth_scale()
depth = depth * depth_scale
dist,_,_,_ = cv2.mean(depth)
print("Detected a {0} {1:.3} meters away.".format(className, dist))
Hope this help

Using VK_IMAGE_TILING_LINEAR for srgb formats?

So I'm still trying to figure out color spaces for render textures, and how to create images without color banding. Within my gbuffer I use VK_FORMAT_R8G8B8A8_UNORM with VK_IMAGE_TILING to for my albedo texture AND VK_FORMAT_A2B10G10R10_UNORM_PACK32 with VK_IMAGE_TILING_OPTIMAL for my normal and emission render textures. For my brightness texture (which holds pixel values that are considered "bright" within a scene) and glow render texture (the final texture for bloom effects to be added later onto the final scene), I use VK_FORMAT_R8G8B8A8_SRGB and VK_IMAGE_TILING_OPTIMAL (although looking at this, I should probably make my bright and final textures R16G16B16A16 float formats instead). What I got was definitely not what I had in mind:
Changing the tiling for my normal, emission, glow, and brightness textures to VK_IMAGE_TILING_LINEAR, however, got me nice results instead, but at the cost of performance:
The nice thing about it though (and sorry about the weird border on the top left, was cropping over on MS paint...), is that the image doesn't suffer from color banding, such as when instead of using these formats for my textures, I use VK_FORMAT_R8G8B8A8_UNORM with VK_IMAGE_TILING_OPTIMAL:
Were you can see banding occuring on the top left of the helmet, as well as underneath it (where the black tubes are). Of course, I've heard of avoiding VK_IMAGE_TILING_LINEAR from this post
In general, I'm having trouble figuring out what would be the best way to avoid using VK_IMAGE_TILING_LINEAR when using srgb textures? I still would like to keep the nice crisp images that srgb gives me, but I am unsure how to solve this issue. The link might actually have the solution, but I'm not very much sure if there's a way to apply it to my gbuffer.
I would also like to clarify that VK_IMAGE_TILING_OPTIMAL works fine for Nvidia based GPUs (well, tested on a GTX 870M) but complains about using VK_IMAGE_TILING_LINEAR for srgb format, however, intel based gpus work fine with VK_IMAGE_TILING_LINEAR and sort of crap out like the first image up top this post when using VK_IMAGE_TILING_OPTIMAL.
The engine is custom made, feel free to check it out in this link
If you fancy some code, I use a function called SetUpRenderTextures() inside Engine/Renderer/Private/Renderer.cpp file, under line 1396:
VkImageCreateInfo cImageInfo = { };
VkImageViewCreateInfo cViewInfo = { };
// TODO(): Need to make this more adaptable, as intel chips have trouble with srgb optimal tiling.
cImageInfo.imageType = VK_IMAGE_TYPE_2D;
cImageInfo.format = VK_FORMAT_R8G8B8A8_UNORM;
cImageInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
cImageInfo.mipLevels = 1;
cImageInfo.extent.depth = 1;
cImageInfo.arrayLayers = 1;
cImageInfo.extent.width = m_pWindow->Width();
cImageInfo.extent.height = m_pWindow->Height();
cImageInfo.samples = VK_SAMPLE_COUNT_1_BIT;
cImageInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
cImageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
cViewInfo.format = VK_FORMAT_R8G8B8A8_UNORM;
cViewInfo.image = nullptr; // No need to set the image, texture->Initialize() handles this for us.
cViewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
cViewInfo.subresourceRange = { };
cViewInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
cViewInfo.subresourceRange.baseArrayLayer = 0;
cViewInfo.subresourceRange.baseMipLevel = 0;
cViewInfo.subresourceRange.layerCount = 1;
cViewInfo.subresourceRange.levelCount = 1;
gbuffer_Albedo->Initialize(cImageInfo, cViewInfo);
gbuffer_Emission->Initialize(cImageInfo, cViewInfo);
cImageInfo.format = VK_FORMAT_R8G8B8A8_SRGB;
cViewInfo.format = VK_FORMAT_R8G8B8A8_SRGB;
cImageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
GlowTarget->Initialize(cImageInfo, cViewInfo);
// It's probably best that these be 64bit float formats as well...
pbr_Bright->Initialize(cImageInfo, cViewInfo);
pbr_Final->Initialize(cImageInfo, cViewInfo);
cImageInfo.format = VK_FORMAT_A2B10G10R10_UNORM_PACK32;
cImageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
cViewInfo.format = VK_FORMAT_A2B10G10R10_UNORM_PACK32;
gbuffer_Normal->Initialize(cImageInfo, cViewInfo);
gbuffer_Position->Initialize(cImageInfo, cViewInfo);
So yes, the rundown. How to avoid using linear image tiling for srgb textures? Is this a hardware specific thing, and is it mandatory? Also, I apologize for any form of ignorance I have on this subject.
Thank you :3
Support for this combination is mandatory, so the corruption in your first image is either an application or a driver bug.
So VK_FORMAT_R8G8B8A8_SRGB is working with VK_IMAGE_TILING_OPTIMAL on your Nvidia GPU but not on your Intel GPU? But VK_FORMAT_R8G8B8A8_SRGB does work with VK_IMAGE_TILING_LINEAR on the Intel GPU?
If so, that sounds like you've got some missing or incorrect image layout transitions. Intel is more sensitive to getting those right than Nvidia is. Do the validation layers complain about anything? You need to make sure the image is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL when rendering to it, and VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL when sampling from it.

Darknet YOLO image size

I am trying to train custom object classifier in Darknet YOLO v2
I gathered a dataset for images most of them are 6000 x 4000 px and some lower resolutions as well.
Do I need to resize the images before training to be squared ?
I found that the config uses:
saturation = 1.5
exposure = 1.5
thats why I was wondering how to use it for different sizes of data sets.
You don't have to resize it, because Darknet will do it instead of you!
It means you really don't need to do that and you can use different image sizes during your training. What you posted above is just network configuration. There should be full network definition as well. And the height and the width tell you what's the network resolution. And it also keeps aspect ratio, check e.g this.
You don't need to resize your database images. PJReddie's YOLO architecture does it by itself keeping the aspect ratio safe (no information will miss) according to the resolution in .cfg file.
For Example, if you have image size 1248 x 936, YOLO will resize it to 416 x 312 and then pad the extra space with black bars to fit into 416 x 416 network.
It is very common to resize images before training. 416x416 is slightly larger than common. Most imagenet models resize and square the images to 256x256 for example. So I would expect the same here. Trying to train on 6000x4000 is going to require a farm of GPUs. The standard process is to square the image to the largest dimension (height, or width), padding with 0's on the shorter side, then resizing using standard image resizing tools like PIL.
You do not need to resize the images, you can directly change the values in darknet.cfg file.
When you open darknet.cfg (yolo-darknet.cfg) file, you can all
hyper-parameters and their values.
As showed in your cfg file images dimensions are (416,416)->(weight,height), you can change the values, so that darknet will automatically resize the images before training.
Since the images have high dimensions, you can adjust batch and sub-division values (lower the values 32,16,8 . it has to be multiples of 2), so that darknet will not crash (memory allocation error)
By default the darknet api changes the size of the images in both inference and training, but in theory any input size w, h = 32 x X where X belongs to a natural number should, W is the width, H the height. By default X = 13, so the input size is w, h = (416, 416). I use this rule with yolov3 in opencv, and it works better the bigger X is.

opengl es 2.0 etc1 powervr SGX 540

I have a problem with ETC1 textures. To load ETC1 textures I use own code that load raw data of ETC1 image, then i use GL operation to load data into GPU memory GLES20.glCompressedTexImage2D(GLES20.GL_TEXTURE_2D, 0, 0x8D64, textureWidth, textureHeight, 0, rawSize, data);
but when device used PowerVR SGX540 GPU, only textures with dimension 512x512 draw correctly. And i don't understand why. OpenGL ES 2.0 standard says that I can use textures with non-power of two dimensions. Please help me to resolve my problem.
It is true that OpenGL ES 2.0 does not have the power of two restriction, however wrap modes and min filter are restricted. Please read the notes on
which states:
Similarly, if the width or height of a texture image are not powers of two and either the GL_TEXTURE_MIN_FILTER is set to one of the functions that requires mipmaps or the GL_TEXTURE_WRAP_S or GL_TEXTURE_WRAP_T is not set to GL_CLAMP_TO_EDGE, then the texture image unit will return (R, G, B, A) = (0, 0, 0, 1).
Also I recommend you to read the answer and comments on this question: Can OpenGL ES render textures of non base 2 dimensions?