I am looking for a way I can do an opacity mask in Compact Framework?
i don't think that the CF version of .Net supports opacity at all. Makes some sense too. Calculating opacity is a very expensive operation, one not well suited to underpowered devices.
That said, here is a possible workaround.
You can roll your own opacity mask for a Bitmap by simply doing the pixel-by-pixel manipulations yourself. The super-slow way to do this would be to use the Bitmap's GetPixel and SetPixel methods, but a much faster way is to use the LockBits method. See:
Bob Powell: Locking Bits
Depending on what exactly you're doing, Windows Mobile devices have a surprising amount of processing power for this sort of thing. Iterating through and processing the 76,800 pixels of a 320x240 Bitmap using LockBits takes just a few milliseconds (depending on how complex the "processing" is, of course). Opacity masking (presumably) involves just comparing two pixel values and possibly calculating a third, so this would be no problem.
Related
It isn't clear to me when it's a good idea to use VK_IMAGE_LAYOUT_GENERAL as opposed to transitioning to the optimal layout for whatever action I'm about to perform. Currently, my policy is to always transition to the optimal layout.
But VK_IMAGE_LAYOUT_GENERAL exists. Maybe I should be using it when I'm only going to use a given layout for a short period of time.
For example, right now, I'm writing code to generate mipmaps using vkCmdBlitImage. As I loop through the sub-resources performing the vkCmdBlitImage commands, should I transition to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL as I scale down into a mip, then transition to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL when I'll be the source for the next mip before finally transitioning to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL when I'm all done? It seems like a lot of transitioning, and maybe generating the mips in VK_IMAGE_LAYOUT_GENERAL is better.
I appreciate the answer might be to measure, but it's hard to measure on all my target GPUs (especially because I haven't got anything running on Android yet) so if anyone has any decent rule of thumb to apply it would be much appreciated.
FWIW, I'm writing Vulkan code that will run on desktop GPUs and Android, but I'm mainly concerned about performance on the latter.
You would use it when:
You are lazy
You need to map the memory to host (unless you can use PREINITIALIZED)
When you use the image as multiple incompatible attachments and you have no choice
For Store Images
( 5. Other cases when you would switch layouts too much (and you don't even need barriers) relatively to the work done on the images. Measurement needed to confirm GENERAL is better in that case. Most likely a premature optimalization even then.
)
PS: You could transition all the mip-maps together to TRANSFER_DST by a single command beforehand and then only the one you need to SRC. With a decent HDD, it should be even best to already have them stored with mip-maps, if that's a option (and perhaps even have a better quality using some sophisticated algorithm).
PS2: Too bad, there's not a mip-map creation command. The cmdBlit most likely does it anyway under the hood for Images smaller than half resolution....
If you read from mipmap[n] image for creating the mipmap[n+1] image then you should use the transfer image flags if you want your code to run on all Vulkan implementations and get the most performance across all implementations as the flags may be used by the GPU to optimize the image for reads or writes.
So if you want to go cross-vendor only use VK_IMAGE_LAYOUT_GENERAL for setting up the descriptor that uses the final image and not image reads or writes.
If you don't want to use that many transitions you may copy from a buffer instead of an image, though you obviously wouldn't get the format conversion, scaling and filtering that vkCmdBlitImage does for you for free.
Also don't forget to check if the target format actually supports the BLIT_SRC or BLIT_DST bits. This is independent of whether you use the transfer or general layout for copies.
I'm trying to use WebGL to speed up computations in a simulation of a small quantum circuit, like what the Quantum Computing Playground does. The problem I'm running into is that readPixels takes ~10ms, but I want to call it several times per frame while animating in order to get information out of gpu-land and into javascript-land.
As an example, here's my exact use case. The following circuit animation was created by computing things about the state between each column of gates, in order to show the inline-with-the-wire probability-of-being-on graphing:
The way I'm computing those things now, I'd need to call readPixels eight times for the above circuit (once after each column of gates). This is waaaaay too slow at the moment, easily taking 50ms when I profile it (bleh).
What are some tricks for speeding up readPixels in this kind of use case?
Are there configuration options that significantly affect the speed of readPixels? (e.g. the pixel format, the size, not having a depth buffer)
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Should I try to aggregate all the textures I'm reading into a single megatexture and sort things out after a single big read?
Should I be using a different method to get the information back out of the textures?
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Yes, yes, yes. readPixels is fundamentally a blocking, pipeline-stalling operation, and it is always going to kill your performance wherever it happens, because it's sending a request for data to the GPU and then waiting for it to respond, which normal draw calls don't have to do.
Do readPixels as few times as you can (use a single combined buffer to read from). Do it as late as you can. Everything else hardly matters.
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
This will get you immensely better performance.
If your graphics are all like you show above, you shouldn't need to do any “layout” at all (which is good, because it'd be very awkward to implement) — everything but the text is some kind of color or boundary animation which could easily be done in a shader, and all the layout can be just a static vertex buffer (each vertex has attributes which point at which simulation-state-texel it should be depending on).
The text will be more tedious merely because you need to load all the digits into a texture to use as a spritesheet and do the lookups into that, but that's a standard technique. (Oh, and divide/modulo to get the digits.)
I don't know enough about your use case but just guessing, Why do you need to readPixels at all?
First, you don't need to draw text or your the static parts of your diagram in WebGL. Put another canvas or svg or img over the WebGL canvas, set the css so they overlap. Let the browser composite them. Then you don't have to do it.
Second, let's assume you have a texture that has your computed results in it. Can't you just then make some geometry that matches the places in your diagram that needs to have colors and use texture coords to look up the results from the correct places in the results texture? Then you don't need to call readPixels at all. That shader can use a ramp texture lookup or any other technique to convert the results to other colors to shade the animated parts of your diagram.
If you want to draw numbers based on the result you can use a technique like this so you'd make a shader at references the result shader to look at a result value and then indexes glyphs from another texture based on that.
Am I making any sense?
As the title says, I'm fleshing out a design for a 2D platformer engine. It's still in the design stage, but I'm worried that I'll be running into issues with the renderer, and I want to avoid them if they will be a concern.
I'm using SDL for my base library, and the game will be set up to use a single large array of Uint16 to hold the tiles. These index into a second array of "tile definitions" that are used by all parts of the engine, from collision handling to the graphics routine, which is my biggest concern.
The graphics engine is designed to run at a 640x480 resolution, with 32x32 tiles. There are 21x16 tiles drawn per layer per frame (to handle the extra tile that shows up when scrolling), and there are up to four layers that can be drawn. Layers are simply separate tile arrays, but the tile definition array is common to all four layers.
What I'm worried about is that I want to be able to take advantage of transparencies and animated tiles with this engine, and as I'm not too familiar with designs I'm worried that my current solution is going to be too inefficient to work well.
My target FPS is a flat 60 frames per second, and with all four layers being drawn, I'm looking at 21x16x4x60 = 80,640 separate 32x32px tiles needing to be drawn every second, plus however many odd-sized blits are needed for sprites, and this seems just a little excessive. So, is there a better way to approach rendering the tilemap setup I have? I'm looking towards possibilities of using hardware acceleration to draw the tilemaps, if it will help to improve performance much. I also want to hopefully be able to run this game well on slightly older computers as well.
If I'm looking for too much, then I don't think that reducing the engine's capabilities is out of the question.
I think the thing that will be an issue is the sheer amount of draw calls, rather than the total "fill rate" of all the pixels you are drawing. Remember - that is over 80000 calls per second that you must make. I think your biggest improvement will be to batch these together somehow.
One strategy to reduce the fill-rate of the tiles and layers would be to composite static areas together. For example, if you know an area doesn't need updating, it can be cached. A lot depends of if the layers are scrolled independently (parallax style).
Also, Have a look on Google for "dirty rectangles" and see if any schemes may fit your needs.
Personally, I would just try it and see. This probably won't affect your overall game design, and if you have good separation between logic and presentation, you can optimise the tile drawing til the cows come home.
Make sure to use alpha transparency only on tiles that actually use alpha, and skip drawing blank tiles. Make sure the tile surface color depth matches the screen color depth when possible (not really an option for tiles with an alpha channel), and store tiles in video memory, so sdl will use hardware acceleration when it can. Color key transparency will be faster than having a full alpha channel, for simple tiles where partial transparency or blending antialiased edges with the background aren't necessary.
On a 500mhz system you'll get about 6.8 cpu cycles per pixel per layer, or 27 per screen pixel, which (I believe) isn't going to be enough if you have full alpha channels on every tile of every layer, but should be fine if you take shortcuts like those mentioned where possible.
I agree with Kombuwa. If this is just a simple tile-based 2D game, you really ought to lower the standards a bit as this is not Crysis. 30FPS is very smooth (research Command & Conquer 3 which is limited to 30FPS). Even still, I had written a remote desktop viewer that ran at 14FPS (1900 x 1200) using GDI+ and it was still pretty smooth. I think that for your 2D game you'll probably be okay, especially using SDL.
Can you just buffer each complete layer into its view plus an additional tile size for all four ends(if you have vertical scrolling), use the buffer again to create a new buffer minus the first column and drawing on a new end column?
This would reduce a lot of needless redrawing.
Additionally, if you want a 60fps, you can look up ways to create frame skip methods for slower systems, skipping every other or every third draw phase.
I think you will be pleasantly surprised by how many of these tiles you can draw a second. Modern graphics hardware can fill a 1600x1200 framebuffer numerous times per frame at 60 fps, so your 640x480 framebuffer will be no problem. Try it and see what you get.
You should definitely take advantage of hardware acceleration. This will give you 1000x performance for very little effort on your part.
If you do find you need to optimise, then the simplest way is to only redraw the areas of the screen that have changed since the last frame. Sounds like you would need to know about any animating tiles, and any tiles that have changed state each frame. Depending on the game, this can be anywhere from no benefit at all, to a massive saving - it really depends on how much of the screen changes each frame.
You might consider merging neighbouring tiles with the same texture into a larger polygon with texture tiling (sort of a build process).
What about decreasing the frame rate to 30fps. I think it will be good enough for a 2D game.
I'm drawing quads in openGL. My question is, is there any additional performance gain from this:
// Method #1
glBegin(GL_QUADS);
// Define vertices for 10 quads
glEnd();
... over doing this for each of the 10 quads:
// Method #2
glBegin(GL_QUADS);
// Define vertices for first quad
glEnd();
glBegin(GL_QUADS);
// Define vertices for second quad
glEnd();
//etc...
All of the quads use the same texture in this case.
Yes, the first is faster, because each call to glBegin or glEnd changes the OpenGL state.
Even better, however, than one call to glBegin and glEnd (if you have a significant number of vertices), is to pass all of your vertices with glVertexPointer (and friends), and then make one call to glDrawArrays or glDrawElements. This will send all your vertices to the GPU in one fell swoop, instead of incrementally by calling glVertex3f repeatedly.
From a function call overhead perspective the second approach is more expensive. If instead of ten quads we used ten thousand. Then glBegin/glEnd would be called ten thousand times per frame instead of once.
More importantly glBegin/glEnd have been deprecated as of OpenGL 3.0, and are not supported by OpenGL ES.
Instead vertices are uploaded as vertex arrays using calls such as glDrawArrays. Tutorial and much more in depth information can be found on the NeHe site.
I decided to go ahead and benchmark it using a loop of 10,000 quads.
The results:
Method 1: 0.0128 seconds
Method 2: 0.0132 seconds
Method #1 does have some improvement, but the improvement is very marginal (3%). It's probably nothing more than the overhead of simply calling more functions. So it's likely that OpenGL itself doesn't get any additional optimization from Method #1.
This is on Windows XP service pack 3 using OpenGL 2.0 and visual studio 2005.
I believe the answer is yes, but you should try it out yourself. Write something to draws 100k quads and see if one is much faster. Then report your results here :)
schnaader: What is meant in the document you read is that you should not have non-gl related code between glBegin and glEnd. They do not mean that you should call it multiple times over calling it in short bits.
I suppose that you get the highest performance gain by reusing the vertices. To achieve that, you would require to maintain some structure for primitives yourself.
You would get better performance for sure in just how much code gets called by the CPU.
Whether or not your drawing performance would be better on the GPU, that would completely depend on the implementation of the driver for your 3d graphics card. You could get potentially wildly different results with a different manufacturer's driver and even with a different version of the driver for the same card.
Initial tests indicate that GDI+ (writing in VB.NET) is not fast enough for my purposes. My application needs to be able to draw tens of thousands of particles (coloured circles, very preferably anti-aliased) in a full screen resolution at 20+ frames per second.
I'm hesitant to step away from GDI+ since I also require many of the other advanced drawing features (dash patterns, images, text, paths, fills) of GDI+.
Looking for good advice about using OpenGL, DirectX or other platforms to speed up particle rendering from within VB.NET. My app is strictly 2D.
Goodwill,
David
If you want to use VB.NET, then you can go with XNA or SlimDX.
I have some experience in creating games with GDI+ and XNA, and I can understand that GDI+ is giving you trouble.
If I where you I'd check out XNA, it's much faster than GDI+ because it actually uses your video card for drawing and it has a lot of good documentation and examples online.
SlimDX also looks good but I don't have any experience with it. SlimDX is basically the DirectX API for .NET.
The only way to get the speed you need is to move away from software rendering to hardware rendering... and unfortunately that does mean moving to OpenGL or DirectX.
The alternative is to try and optimise your graphics routines to only draw the particles that need to be drawn, not the whole screen/window.
I would agree with JaredPar that you're better off profiling first to determine if your existing codebase can be improved before making a huge switch to a new framework. DirectX is not the easiest framework if you're unfamiliar with it.
The most significant speed increase I found, when writing a game maker with GDI+, was to convert my bitmaps to Format32bppPArgb;-
SuperFastBitmap = ConvertImagePixelFormat(SlowBitmap, Imaging.PixelFormat.Format32bppPArgb)
If they are not in this format already, you'll see the difference immediately when you convert.
It's possible the problem is in your algorithm and not GDI+. Profiling is the only way to know for sure. Without a profile it's very possible you will switch to a new GUI framework and hit the exact same problems.
If you did profile, what part of GDI+ was causing a problem?
As Jared said,
it could be that a significant fraction of your cycles are not going into GDI, and you might be able to reduce those.
A simple way to find those is to halt it at random a few times and examine the stack. The chance that you will catch it in the act of wasting time is equal to the fraction of time being wasted.
Any instruction or call instruction that appears on more than one such sample is something that, if you could replace it, you would see a speedup.
In general, the method is this.
As you're working in VB.net, have you tried using WPF (Part of .net since 3.0)? As WPF is based on DirectX rather than GDI+, that should give you the speed you need, although developing WPF is not straight-forward at all.
Because the GDI+ is not moved by the graphics card, it's slow to render because it uses the CPU to render. At least, you can use DirectX or SlimDX.
(sorry for bad english)
See This: http://msdn.microsoft.com/en-us/library/windows/desktop/ff729480%28v=vs.85%29.aspx
http://www.codeproject.com/Articles/159586/Starting-DirectX-with-Visual-Basic-NET