Windows Low-Level Graphics [closed] - api

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm new to programming. I do know C/C++ and the basics of Win32. I am now trying to do graphics, but I want the fastest connection to the screen. I realize most are going with Opengl or DirectX. But, I don't want the overhead. I want to start from scratch and control the pixel data. I know about GDI bitmap, but I'm not sure if this is best access to the data. I know that I have to talk through windows, which is the trouble. Do Opengl and DirectX compile down to the level of GDI or is there a special way they do it, do they bypass or use similar code? Please, Don't ask why I want to do this. Maybe an explanation of how this is done might help. Like how windows combines all windows to create the final image.

The most direct access to pixel data is via shaders, which are supported by both OpenGL and Direct3D. They are cross-compiled and run directly on the video card. They do not use OpenGL, they do not have OpenGL overhead. OpenGL is just used to get them to the graphics card's own processor in the first place.
Anything you do on the CPU has to first be copied across the bus (typically PCI-express) to the video card. GDI is actually many levels removed from the graphics memory.
OpenGL, Direct3D, Direct2D, GDI, and GDI+ are all abstraction layers. The GPU vendor writes a driver that accepts these standard command functions, re-encodes the data in the card-specific format, then sends it to the card. Typically OpenGL and Direct3D are the most heavily optimized and also require the least amount of re-encoding.
How Windows combines the various on-screen windows to create the full-screen image depends heavily on what version of Windows you are talking about. DWM changed everything. Since DWM was introduced in Vista, programs render to their personal areas of GPU memory, then the window manager uses the texture lookup units of the video card to efficiently layer each of the programs' individual areas onto the screen primary buffer. When a program (usually a game) requests full-screen exclusive access, this step is skipped and the driver causes rendering commands from that application to affect the primary screen buffer directly.
Assuming that the CPU is generating the data which needs to be displayed, the fastest and most efficient approach is likely to be block-copying that data into a vertex buffer object and using OpenGL commands to rasterize it as lines or polygons or whatever (or the Direct3D equivalent). If you previously thought that GDI was the low-level interface, you've got some reading ahead of you to make this work. But it will run several orders of magnitude faster than pure GDI. So much faster, in fact, that the new architecture is that GDI (and WPF) is built on top of Direct2D and/or Direct3D.

but I want the fastest connection to the screen
I want to start from scratch and control the pixel data
You're asking for the impossible. You get best performance when you use GPU-accelerated functions. However, in this case you don't get direct access to pixel data, and trying to access it (read it back or write) will negatively impact the performance, because you'll have to transfer data from system memory to video memory. As a result anything that is being streamed from system memory to video memory should be handled with care. Plus you'll have to study API.
If you "start from scratch" and do rendering on CPU, you'll get easy access to pixel data and full control over the rendering, but performance will be inferior to GPU (CPU is less suitable for parallel processing, and system memory can be slower by order of magnitude than video memory), plus you'll spend significant amount of time reinventing the wheel.
Do Opengl and DirectX compile down to the level of GDI or is there a special way they do it, do they bypass or use similar code?
No. They communicate with graphic hardware nearly directly using drivers provided by hardware manufacturer. And those "direct hardware access" interfaces used by DirectX/OpenGL won't be available to you - they're hardware-specific and manufacturer specific, can be internal and possibly even protected by patents.
There are, of course, few legacy hardware interfaces which ARE available to you (namely VESA or VGA 13h mode), however, their direct use is normally forbidden by operating system (you can't easily access VESA on windows), so to access them you'll have to either boot MS-DOS, use custom operating system, or helper classes (such as SVGAlib on linux) which might only function under root privilegies. And of course, even if you actually use VESA/VGA to render something yourself, on any hardware (newer than RivaTNT 2 Pro) performance will be horrible compared to hardware-accelerated rendering done by OpenGL/DirectX. Have you ever seen how fast windows xp works when it doesn't have proper GPU driver (takes a second to redraw window)? that's how fast it is going to work with direct VESA/VGA access.
Please, Don't ask why I want to do this.
It makes sense to ask why you would want to do that. Your "I want direct low-level access" approach was suitable maybe 15..20 years ago or in DOS era. Right now reasonable solution would be to use existing API (that is maintained by somebody who isn't you) and search for a way to fully utilize it. Of course, if you wanted to develop drivers, that would be another story.

Do Opengl and DirectX compile down to the level of GDI or is there a special way they do it
and
I realize most are going with Opengl or DirectX. But, I don't want the overhead
So what you're saying is **you have absolutely no clue what OpenGL or DirectX actually do, and yet you've decided that they are not efficient enough for your needs.
I'm sorry, but this is nonsense. It is impossible to answer a question like that.
In the real world, you have a small supercomputer dedicated to doing graphics. And you get access to it through OpenGL and DirectX.
And the reason they are fast is that they do NOT just "start from scratch and control the pixel data".
So please, if you want serious answers, try letting those with the knowledge to answer your questions decide which question is best.
The correct answer, if you want efficient graphics, is to use DirectX or OpenGL.

Related

is it recommended to use SPI flash to run code instead internal flash due to memory limitation of internal flash?

We used the LPC546xx family microcontroller in our project, currently, at the initial stage, we are finalizing the software and hardware requirements. The basic firmware size (which contains RTOS, 3rd party stack, library, etc...) currently is 480 KB. Now once full application developed than the size will exceed the internal flash size (512KB) and plus we needed storage which can hold firmware update image separately.
So we planned to use SPI flash (S25LP064A-JBLE, http://www.issi.com/WW/pdf/IS25LP032-064-128.pdf, serial flash memory) of 4MB\8MB to boot and run firmware.
is it recommended to run code from SPI flash? how can I map external flash memory directly to CPU memory space? Can anyone give an example that contains this memory mapping(linker script etc..) or demo application in which LPC546xx uses SPI FLASH?
Generally speaking it's not recommended, or differently put: the closer to the CPU the better. Both S25LP064A and LPC546xx however support XIP, so it is viable.
This is not a trivial issue as many aspects are affecting. I.e. issue is best avoided and should really have been ironed out in the planning stage. Embedded Systems are more about compromising than anything and making the right/better choices takes skill end experience.
Same question with replies on the NXP forum: link
512K of NVRAM is huge. There are almost certainly room for optimisations even if 3'rd party libraries are used.
On a related note this discussion concerning XIP should give valuable insight: link.
I would strongly encourage use of file-systems if not done already, for which external storage is much better suited. The further from the computational unit, the more relevant. That's not XIP and the penalty is copy-to-RAM either way you do it. I.e. performance will be slower. But in my experience, the need for speed has often-times not been thoroughly considered and at least partially greatly overestimated.
Regarding your mentioning of RTOS and FW-upgrade:
Unless it's a poor RTOS there's file-system awareness built in. Especially for FW upgrading (Note: you'll need room for 3 images, factory reset included), unless already supported by the SoC-vendor by some other means (OTA), it will make life much easier and less risky. If there's no FS-awareness, it can be added.
FW upgrade requires a lot of extra storage. More if simpler. Simpler is however also safer which especially for FW upgrades matters hugely. In the simplest case (binary flat image), you'll need at least twice the amount of memory you're already consuming.
All-in-all: I think the direction you're going is viable and depending on the actual situation perhaps your only choice.

Will vulkan api handle window creation?

I have heared vulkan will unify the initialisation on different operating systems. Does that mean vulkan creates the window, handles mouse/keyboard events so I can avoid using os specific programming?
It won't. Window creation will be platform specific and an WSI extension will let you link the window to a renderable Image that you can push to the screen.
From information gleaned out of the presentations that have been given I expect that you will use a platform specific WSI Extension to create a Swapchain for your window.
Then each time you wish to push a frame to the screen you need to acquire a presentable image from the swapchain; render to it and then present it.
see this slide pack from slide 109 onward.
No, Vulkan is a low-level API for accessing GPUs. It does not deal with windows and input. In fact it can be used easily in a "headless" manner with no visual output at all.
Porbably not, Vulkan API is an Graphics Library much like OpenGL.
Where in Linux Ubuntu OpenGL is used for animation effects of the desktop in Unity and could be replaced with Vulkan for better performance.
But I don't think Windows will change it as they have their own DirectX Graphics Library and would be weird if they use something else instead their own software.
The most applications that are going to benefit from Vulkan are Games and other software that uses either 2D or 3D rendering.
It's very likely that most of the games are going to change to Vulkan because it's Cross-platform and therefore they will gain more users which equals to more profit.
Khronos (Vulkan API developers) are also bringing out tools that will largely port your application from OpenGL or DX12 to Vulkan therefore requiring less development/porting from the software developers side.
So...
Window creation, likely. (Although the code behind the window is CPU side, the library that draws the window on screen might be using Vulkan) - this differs greatly from which OS, distribution and version you are working on.
Mouse/keyboard events, no as this doesn't require any graphical calculations but CPU calculations.
Window (frames) are general desktop manager controls; you could display the vulkan app's content in the client area, otherwise vulkan would have to provide interfaces to the desktop manager for window creation (GUI library). Someone could simply create a device context (DC in windows, similar for the X server) then manage the "vulkan app" manually like a windowed game with no chrome (frameless), but this would be a great deal of work right now.
Old hand Windows developer bible addressing device context and rendering among many other things: Programming Windows®, Fifth Edition (Developer Reference) 5th Edition. Very good read and from an agnostic point of view provides a great deal of carry over knowledge loosely applicable to most systems.

Embedded System and Serial Flash wear issues

I am using a serial NOR flash (SPI Based) for my embedded application and also I have to implement a file system over it. That makes my NOR flash more prone to frequent erase and write cycles where having a wear level Algorithm comes into picture. I want to ask few questions regarding the same:
First, is it possible to implement a Wear Level Algorithm for Nor flash, if yes then why most of the time I find the solutions for NAND Flash and not NOR Flash?
Second, are serial SPI based low cost NAND flashes available, if yes then kindly share the part number for the same.
Third, how difficult is it to implement our own Wear Level algorithm?
Fourth, I have also read/heard that industrial grade NOR Flashes have higher erase/write cycles (in millions!!), is this understanding correct? If yes then kindly let me know the details of such SPI NOR Flash, which may also lead to avoiding implementation of wear level algorithm, if not completely then since I'm planning to implement my own wear level algorithm, it might give me a little room and ease in certain areas to implement the wear level algorithm.
The constraint to all these point is cost, I would want to have low cost solution to these issues.
Thanks in Advance
Regards
Aditya Mittal
(mittal.aditya12#gmail.com)
Implementing a wear-levelling algorithm is is not trivial, but not impossible either:
Your wear-levelling driver needs to know when disk blocks are no longer used by the filing system (this is known as TRIM support on modern SSDs). In practice, this means you need to modify your block driver API and filing systems above it, or have the wear-levelling driver aware of the filing system's free-space map. This second option is easy for FAT, but probably patented.
You need to reserve at least an erase-unit + a few allocation units to allow erase unit recycling. Reserving more blocks will increase performance
You'll want a background thread to perform asynchronous erase-unit recycling
You'll need to test, test an test again. When I last built one of these, we built a simulation of both flash and ran the real filing system on top it, and tortured the system for weeks.
There are lots and lots of patents covering aspects of wear-leveling. By the same token, there are two at least two wear-levelling layers in the Linux Kernel.
Given all of this, licensing a third-party library is probably cost-effective,
Atmel/Adesto etc. make those little serial flash chips by the billion. They also have loads of online docs. I suspect that the serial flash beetles don't implement wear-levelling because of cost - the devices they are typically used in are very cheap and tend to have a limited lifetime anyway. Bulk, 4-line NAND flash that is expected to see heavier and lengthy use, (eg. SD cards), have complex, (relatively), built-in controllers that can implement wear-levelling in a transparent manner.
I no longer use one-pin interface serial flash, partly due to the wear issue. An SD-card is cheap enough for me to use and, even if one does break, an on-site technician, (or even the customer), can easily swap it out.
Implementing a wear-levelling algo. is too expensive, both in terms of development time, (especially testing if the device has to support a file system that must not corrupt on power fail etc), and CPU/RAM for me to bother.
If your product is so cost-sensitive that you have to use serial NOR flash, I suggest that you ignore the issue.

How do you hack/decompile Camera firmware? (w/ decompiling tangent)

I wanted to know what steps one would need to take to "hack" a camera's firmware to add/change features, specifically cameras of Canon or Olympus make.
I can understand this is an involved topic, but a general outline of the steps and what I issues I should keep an eye out for would be appreciated.
I presume the first step is to take the firmware, load it into a decompiler (any recommendations?) and examine the contents. I admit I've never decompiled code before, so this will be a good challenge to get me started, any advice? books? tutorials? what should I expect?
Thanks stack as always!
Note : I know about Magic Lantern and CHDK, I want to get technical advise on how they were started and came to be.
http://magiclantern.wikia.com/wiki/Decompiling
http://magiclantern.wikia.com/wiki/Struct_Guessing
http://magiclantern.wikia.com/wiki/Firmware_file
http://magiclantern.wikia.com/wiki/GUI_Events/550D
http://magiclantern.wikia.com/wiki/Register_Map/Brute_Force
I wanted to know what steps one would need to take to "hack" a
camera's firmware to add/change features, specifically cameras of
Canon or Olympus make.
General steps for this hacking/reverse engineering:
Gathering information about the camera system (main CPU, Image coprocessor, RAM/Flash chips..). Challenges: Camera system makers tend to hide such sensitive information. Also, datasheets/documentation for proprietary chips are not released to public at all.
Getting firmware: through dumping Flash memory inside the camera or extracting the firmware from update packages used for camera firmware update. Challenges: Accessing readout circuitry for flash is not a trivial job specially with the fact that camera systems have one of the most densely populated PCBs. Also, Proprietary firmware are highly protected with sophisticated encryption algorithms when embedded into update packages.
Dis-assembly: getting a "bit" more readable instructions out of the opcode firmware. Challenges: Although dis-assemblers are widely available, they will give you the "operational" equivalent assembly code out of the opcode with no guarantee for being human readable/meaningful.
Customization: Just after understanding most of the code functionalities, you can make modifications that need not to harm normal operation of the camera system. Challenges: Not an easy task.
Alternatively, I highly recommend you to look for an already open source camera software (also HW). You can learn a lot about camera systems.
Such projects are: Elphel and AXIOM

Mindset difference between workstation and embedded programmers [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What do you think are the difference in mindset between a programmer doing work for a desktop environment (windows, linux, whatever...) and someone doing work on an embedded system?
A simple example I can think of is that in an embedded environment, I always check that a malloc is not NULL. Most code I have seen that target desktops is certainly not diligent in checking malloc return value.
Any other examples of mindset differences?
Funny that you mention malloc() specifically in your example.
In every hard-real-time, deeply embedded system that I've worked on, memory allocation is managed specially (usually not the heap, but fixed memory pools or something similar)... and also, whenever possible, all memory allocation is done up-front during initialization. This is surprisingly easier than most people would believe.
malloc() is vulnerable to fragmentation, is non-deterministic, and doesn't discrminate between memory types. With memory pools, you can have pools that are located/pulling from super fast SRAM, fast DRAM, battery-backed RAM (I've seen it), etc...
There are a hundred other issues (in answer to your original question), but memory allocation is a big one.
Also:
Respect for / knowledge of the hardware platform
Not automatically asssuming the hardware is perfect or even functional
Awareness of certain language apects & features (e.g., exceptions in C++) that can cause things to go sideways quickly
Awareness of CPU loading and memory utilization
Awareness of interrupts, pre-emption, and the implications on shared data (where absolutely necessary -- the less shared data, the better)
Most embedded systems are data/event driven, as opposed to polled; there are exceptions of course
Most embedded developers are pretty comfortable with the concept of state machines and stateful behavior/modeling
Desktop programmers view resources as practically unlimited. Memory, computing power, drive space. Those never run out. Embedded programmers focus intently on all of those.
Oh, and embedded programmers also often have to worry about memory alignment issues. Desktop coders don't. The Arm chips care. x86 chips don't.
I desktop environment there's the idea that "hey I can always release an update or patch to fix this later." In embedded design, you get more "this has to work cause we don't want to recall the device or release an even longer patching program."
size matters
2 things - as Suroot already mentioned, once you release a desktop app, it doesn't have to be "forever", especially nowadays.
But in embedded, once you "ship it", it's on its way to Mars so you're not going to be able to pull it back.
Also one of the major differences are that embedded programmers are generally a LOT more conscious about efficient code and memory management - desktops run horrible code really fast, embedded doesn't.