OpenGL power of two texture performance [duplicate] - optimization

I am creating an OpenGL video player using Ffmpeg and all my videos aren't power of 2 (as they are normal video resolutions). It runs at fine fps with my nvidia card but I've found that it won't run on older ATI cards because they don't support non-power-of-two textures.
I will only be using this on an Nvidia card so I don't really care about the ATI problem too much but I was wondering how much of a performance boost I'd get if the textuers were power-of-2? Is it worth padding them out?
Also, if it is worth it, how do I go about padding them out to the nearest larger power-of-two?

Writing a video player you should update your texture content using glTexSubImage2D(). This function allows you to supply arbitarily sized images, that will be placed somewhere in the target texture. So you can initialize the texture first with a call of glTexImage() with the data pointer being NULL, then fill in the data.
The performance gain of pure power of 2 textures strongly depends on the hardware used, but in extreme cases it may be up to 300%.

Related

HoloLens external rendering

Does soneone of you have a good solution for external rendering for Microsoft HoloLens Apps? Specified: Is it possible to let my laptop render an amount of 3D objects that is too much for the HoloLens GPU and then display them with the HoloLens by wifi including the spatial mapping and interaction?
It's possible to render remotely both directly from the unity editor and from a built application.
While neither achieves your goal of a "good solution" they both allow very intensive applications to at least run at all.
This walks you through how to add it to an app you're building.
https://learn.microsoft.com/en-us/windows/mixed-reality/add-holographic-remoting
This is for running directly from the editor:
https://blogs.unity3d.com/2018/05/30/create-enhanced-3d-visuals-with-holographic-emulation-in-uwp/
I don't think this is possible since, you can't really access the OS or the processor at all on the HoloLens. Even if you do manage to send the data to a 3rd party to process, the data will still need to be run back through the HoloLens which is really just the same as before.
You may find a way to perhaps hook up a VR backpack to it but even then, I highly doubt it would be possible.
If you are having trouble rendering 3D objects, then you should reduce the number of triangles, get a lower resolution shader on it, or reduce the size of the object. The biggest factor in processing 3D objects on the HoloLens is how much space is being drawn on the lens. If your object takes up 25% of the view instead of 100% it will be easier to process on the HoloLens.
Also if you can't avoid a lot of objects in the scene maybe check out LOD, which reduces the resolution of objects based off of distance to it and vice versa.

Check Device Type For GPUImage

GPUImage requires, for iPhone 4 and below, images smaller than 2048 pixels. The 4S and above can handle much larger. How can I check to see which device my app is currently running on? I haven't found anything in UIDevice that does what I'm looking for. Any suggestions/workarounds?
For this, you don't need to check device type, you simply need to read the maximum texture size supported by the device. Luckily, there is a built-in method within GPUImage that does this for you:
GLint maxTextureSize = [GPUImageContext maximumTextureSizeForThisDevice];
The above will give you the maximum texture size for the device you're running this on. That will determine the largest image size that GPUImage can work with on that device, and should be future-proof against whatever iOS devices come next.
This method works by caching the results of this OpenGL ES query:
glGetIntegerv(GL_MAX_TEXTURE_SIZE, &maxTextureSize);
if you're curious.
I should also note that you can provide images larger than the max texture size to the framework, but they get scaled down to the largest size supported by the GPU before processing. At some point, I may complete my plan for tiling subsections of these images in processing so that larger images can be supported natively. That's a ways off, though.
This is among the best device-detection libraries I've come across: https://github.com/erica/uidevice-extension
EDIT: Although the readme seems to suggest that the more up-to-date versions are in her "Cookbook" sources. Perhaps this one is more current.
Here is a useful class that I have used several times in the past that is very simple and easy to implement,
https://gist.github.com/Jaybles/1323251

Windows Low-Level Graphics [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm new to programming. I do know C/C++ and the basics of Win32. I am now trying to do graphics, but I want the fastest connection to the screen. I realize most are going with Opengl or DirectX. But, I don't want the overhead. I want to start from scratch and control the pixel data. I know about GDI bitmap, but I'm not sure if this is best access to the data. I know that I have to talk through windows, which is the trouble. Do Opengl and DirectX compile down to the level of GDI or is there a special way they do it, do they bypass or use similar code? Please, Don't ask why I want to do this. Maybe an explanation of how this is done might help. Like how windows combines all windows to create the final image.
The most direct access to pixel data is via shaders, which are supported by both OpenGL and Direct3D. They are cross-compiled and run directly on the video card. They do not use OpenGL, they do not have OpenGL overhead. OpenGL is just used to get them to the graphics card's own processor in the first place.
Anything you do on the CPU has to first be copied across the bus (typically PCI-express) to the video card. GDI is actually many levels removed from the graphics memory.
OpenGL, Direct3D, Direct2D, GDI, and GDI+ are all abstraction layers. The GPU vendor writes a driver that accepts these standard command functions, re-encodes the data in the card-specific format, then sends it to the card. Typically OpenGL and Direct3D are the most heavily optimized and also require the least amount of re-encoding.
How Windows combines the various on-screen windows to create the full-screen image depends heavily on what version of Windows you are talking about. DWM changed everything. Since DWM was introduced in Vista, programs render to their personal areas of GPU memory, then the window manager uses the texture lookup units of the video card to efficiently layer each of the programs' individual areas onto the screen primary buffer. When a program (usually a game) requests full-screen exclusive access, this step is skipped and the driver causes rendering commands from that application to affect the primary screen buffer directly.
Assuming that the CPU is generating the data which needs to be displayed, the fastest and most efficient approach is likely to be block-copying that data into a vertex buffer object and using OpenGL commands to rasterize it as lines or polygons or whatever (or the Direct3D equivalent). If you previously thought that GDI was the low-level interface, you've got some reading ahead of you to make this work. But it will run several orders of magnitude faster than pure GDI. So much faster, in fact, that the new architecture is that GDI (and WPF) is built on top of Direct2D and/or Direct3D.
but I want the fastest connection to the screen
I want to start from scratch and control the pixel data
You're asking for the impossible. You get best performance when you use GPU-accelerated functions. However, in this case you don't get direct access to pixel data, and trying to access it (read it back or write) will negatively impact the performance, because you'll have to transfer data from system memory to video memory. As a result anything that is being streamed from system memory to video memory should be handled with care. Plus you'll have to study API.
If you "start from scratch" and do rendering on CPU, you'll get easy access to pixel data and full control over the rendering, but performance will be inferior to GPU (CPU is less suitable for parallel processing, and system memory can be slower by order of magnitude than video memory), plus you'll spend significant amount of time reinventing the wheel.
Do Opengl and DirectX compile down to the level of GDI or is there a special way they do it, do they bypass or use similar code?
No. They communicate with graphic hardware nearly directly using drivers provided by hardware manufacturer. And those "direct hardware access" interfaces used by DirectX/OpenGL won't be available to you - they're hardware-specific and manufacturer specific, can be internal and possibly even protected by patents.
There are, of course, few legacy hardware interfaces which ARE available to you (namely VESA or VGA 13h mode), however, their direct use is normally forbidden by operating system (you can't easily access VESA on windows), so to access them you'll have to either boot MS-DOS, use custom operating system, or helper classes (such as SVGAlib on linux) which might only function under root privilegies. And of course, even if you actually use VESA/VGA to render something yourself, on any hardware (newer than RivaTNT 2 Pro) performance will be horrible compared to hardware-accelerated rendering done by OpenGL/DirectX. Have you ever seen how fast windows xp works when it doesn't have proper GPU driver (takes a second to redraw window)? that's how fast it is going to work with direct VESA/VGA access.
Please, Don't ask why I want to do this.
It makes sense to ask why you would want to do that. Your "I want direct low-level access" approach was suitable maybe 15..20 years ago or in DOS era. Right now reasonable solution would be to use existing API (that is maintained by somebody who isn't you) and search for a way to fully utilize it. Of course, if you wanted to develop drivers, that would be another story.
Do Opengl and DirectX compile down to the level of GDI or is there a special way they do it
and
I realize most are going with Opengl or DirectX. But, I don't want the overhead
So what you're saying is **you have absolutely no clue what OpenGL or DirectX actually do, and yet you've decided that they are not efficient enough for your needs.
I'm sorry, but this is nonsense. It is impossible to answer a question like that.
In the real world, you have a small supercomputer dedicated to doing graphics. And you get access to it through OpenGL and DirectX.
And the reason they are fast is that they do NOT just "start from scratch and control the pixel data".
So please, if you want serious answers, try letting those with the knowledge to answer your questions decide which question is best.
The correct answer, if you want efficient graphics, is to use DirectX or OpenGL.

recommended limit for memory management in Cocos2d?

is there a recommended limit for the images in Cocos2d, whether there are too big and take too much memory? Are there some rules, in dimensions or in Kb, to avoid slowing the game down? (for the background image, or the graphics of my characters (even if i use a batch node)?)
Thanks for your answer
First of all, memory usage has very, very, very little to do with performance. You can fill up the entire memory with textures, the game won't care. It's when you render them where there will be a difference. And then it only matters how much of the screen area you're filling with textures, how heavily they're overlayed, batched, rotated, scaled, shaded and alpha-blended. Those are the main factors in texture rendering performance. Memory usage plays a very insignificant role.
You may be interested in the cocos2d sprite-batch performance test I did and the general cocos2d performance analysis. Both come with test projects.
As for the maximum texture sizes have a look at the table from my Learn Cocos2D book:
Note that iPhone and iPhone 3G devices have a 24 MB texture memory limit. 3rd generation (iPhone 3GS) and newer devices don't have that limit anymore. Also keep in mind that while a device may have 256 MB of memory installed, significantly less memory will be available for use by apps.
For example, on the iPad (1st gen) it is recommended not to use more than 100 MB of memory, with a maximum available memory peaking at around 125 MB and memory warning starting to show as early as around 80-90 MB memory usage.
With iOS 5.1 Apple also increased the maximum texture size of the iPad 2. The safest and most commonly usable texture size is 2048x2048 for Retina textures, and 1024x1024 for standard resolution textures.
Not in the table are iPod touch devices because they're practically identical to the iPhone models of the same generation, but not as easily identifiable. For example the iPod touch 3rd generation includes devices with 8, 16 and 32GB of flash memory, but the 8GB model is actually 2nd generation hardware.
The dimensional size of images and textures depends on the device you are supporting. Older devices supported small layers, I think 2048x2048 in size. I don't think such limitation exists on current devices.
For large images, you definitely want to use batch nodes as they have been tested to demonstrate the largest performance gain when dealing with large images. Though it is a good idea to use them for as much as possible in general.
As for how much you can load, it really depends on the device. The new iPad has 1 GB of memory and is designed to have much more available memory for large images. A first-gen iPad has 1/4 this amount of memory, and in my experience I start to see an app crash when it gets around 100 MB of memory used (as confirmed using Instruments).
The trick is to use only as much memory as you need for the current app's operation, then release it when you move to a new scene or new set of images/sprites/textures. You could for example have very large tiled textures where only the tiles nearest the viewport are loaded into memory. You could technically have an infinite sized view that stretches forever if you remove from memory those parts of the view that are not visible onscreen.
And of course when dealing with lots of resources, make sure your app delegate responds appropriately to its memory warnings.
As per my knowledge.. A batch node of size 1024x1024 takes around 4 MB of space which is only texture memory.. And an application has maximum limit of 24 MB. So game slows down as you reach this 24 MB space and crashes after that. To avoid slowness I used maximum of 4 Batch Nodes at one time i.e.16 MB. Rest 8 MB was left for variables and other data. Before using more batch node I used to clean memory and remove unused batch nodes.. I don't know about memory limit in 4s but in case of iPhone 4 this was what I learnt.
Using this logic in mind I used to run my game smoothly.

Is it possible to do GPU programming if I have an integrated graphics card?

I have an HP Pavilion Laptop, it's so-called graphics card is some sort of integrated NVIDIA driver running on shared memory. To give you an idea of its capabilities, if a videogame was made in the last 5 years at a cost of more than a couple million dollars, it just won't be playable on my computer.
Anyways, I was wondering if I could do GPU programming, like CUDA, on this thing. I don't expect it to be fast, I'd just like to get the experience and not make my laptop catch fire in the meanwhile.
Find out what GPU your laptop is, and compare it against this list: http://en.wikipedia.org/wiki/CUDA#Supported_GPUs. Most likely, CUDA will not be supported.
This doesn't necessarily prevent you from doing "GPU programming", however. If the GPU supports fragment and vertex shaders, you can use the fixed pipeline to send data to the card (for example, through texture data) and do your processing in a fragment shader. You will then do a read from the pixel buffer to get the data back into system memory. Though hackish, this approach was quite popular until CUDA and other frameworks like OpenCL were introduced.