Mindset difference between workstation and embedded programmers [closed]

Mindset difference between workstation and embedded programmers [closed] - embedded

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What do you think are the difference in mindset between a programmer doing work for a desktop environment (windows, linux, whatever...) and someone doing work on an embedded system?
A simple example I can think of is that in an embedded environment, I always check that a malloc is not NULL. Most code I have seen that target desktops is certainly not diligent in checking malloc return value.
Any other examples of mindset differences?

Funny that you mention malloc() specifically in your example.
In every hard-real-time, deeply embedded system that I've worked on, memory allocation is managed specially (usually not the heap, but fixed memory pools or something similar)... and also, whenever possible, all memory allocation is done up-front during initialization. This is surprisingly easier than most people would believe.
malloc() is vulnerable to fragmentation, is non-deterministic, and doesn't discrminate between memory types. With memory pools, you can have pools that are located/pulling from super fast SRAM, fast DRAM, battery-backed RAM (I've seen it), etc...
There are a hundred other issues (in answer to your original question), but memory allocation is a big one.
Also:
Respect for / knowledge of the hardware platform
Not automatically asssuming the hardware is perfect or even functional
Awareness of certain language apects & features (e.g., exceptions in C++) that can cause things to go sideways quickly
Awareness of CPU loading and memory utilization
Awareness of interrupts, pre-emption, and the implications on shared data (where absolutely necessary -- the less shared data, the better)
Most embedded systems are data/event driven, as opposed to polled; there are exceptions of course
Most embedded developers are pretty comfortable with the concept of state machines and stateful behavior/modeling

Desktop programmers view resources as practically unlimited. Memory, computing power, drive space. Those never run out. Embedded programmers focus intently on all of those.
Oh, and embedded programmers also often have to worry about memory alignment issues. Desktop coders don't. The Arm chips care. x86 chips don't.

I desktop environment there's the idea that "hey I can always release an update or patch to fix this later." In embedded design, you get more "this has to work cause we don't want to recall the device or release an even longer patching program."

size matters

2 things - as Suroot already mentioned, once you release a desktop app, it doesn't have to be "forever", especially nowadays.
But in embedded, once you "ship it", it's on its way to Mars so you're not going to be able to pull it back.
Also one of the major differences are that embedded programmers are generally a LOT more conscious about efficient code and memory management - desktops run horrible code really fast, embedded doesn't.

Related

is it recommended to use SPI flash to run code instead internal flash due to memory limitation of internal flash?

We used the LPC546xx family microcontroller in our project, currently, at the initial stage, we are finalizing the software and hardware requirements. The basic firmware size (which contains RTOS, 3rd party stack, library, etc...) currently is 480 KB. Now once full application developed than the size will exceed the internal flash size (512KB) and plus we needed storage which can hold firmware update image separately.
So we planned to use SPI flash (S25LP064A-JBLE, http://www.issi.com/WW/pdf/IS25LP032-064-128.pdf, serial flash memory) of 4MB\8MB to boot and run firmware.
is it recommended to run code from SPI flash? how can I map external flash memory directly to CPU memory space? Can anyone give an example that contains this memory mapping(linker script etc..) or demo application in which LPC546xx uses SPI FLASH?

Generally speaking it's not recommended, or differently put: the closer to the CPU the better. Both S25LP064A and LPC546xx however support XIP, so it is viable.
This is not a trivial issue as many aspects are affecting. I.e. issue is best avoided and should really have been ironed out in the planning stage. Embedded Systems are more about compromising than anything and making the right/better choices takes skill end experience.
Same question with replies on the NXP forum: link
512K of NVRAM is huge. There are almost certainly room for optimisations even if 3'rd party libraries are used.
On a related note this discussion concerning XIP should give valuable insight: link.
I would strongly encourage use of file-systems if not done already, for which external storage is much better suited. The further from the computational unit, the more relevant. That's not XIP and the penalty is copy-to-RAM either way you do it. I.e. performance will be slower. But in my experience, the need for speed has often-times not been thoroughly considered and at least partially greatly overestimated.
Regarding your mentioning of RTOS and FW-upgrade:
Unless it's a poor RTOS there's file-system awareness built in. Especially for FW upgrading (Note: you'll need room for 3 images, factory reset included), unless already supported by the SoC-vendor by some other means (OTA), it will make life much easier and less risky. If there's no FS-awareness, it can be added.
FW upgrade requires a lot of extra storage. More if simpler. Simpler is however also safer which especially for FW upgrades matters hugely. In the simplest case (binary flat image), you'll need at least twice the amount of memory you're already consuming.
All-in-all: I think the direction you're going is viable and depending on the actual situation perhaps your only choice.

Do any microprocessors today use Scoreboarding or Tomasulo's algorithm? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I've researched a bit and i found out about Intel pentium pro, AMD K7, IBM power PC but these are pretty old. I'm not able to find any info about current day processors that use these mechanisms for dynamic scheduling

Every modern OoO exec CPU uses Tomasulo's algorithm for register renaming. The basic idea of renaming onto more physical registers in a kind of SSA dependency analysis hasn't changed.
Modern Intel CPUs like Skylake have evolved some since Pentium Pro (e.g. renaming onto a physical register file instead of holding data right in the ROB), but PPro and the P6 family is a direct ancestor of the Sandybridge-family. See https://www.realworldtech.com/sandy-bridge/ for some discussion of the first member of that new family. (And if you're curious about CPU internals, a much more in-depth look at it.) See also https://agner.org/optimize/ but Agner's microarch guide focuses more on how to optimize for it, e.g. that register renaming isn't a bottleneck on modern CPUs: rename width matches issue width, and the same register can be renamed 4 times in an issue group of 4 instructions.
Advancements in managing the RAT include Nehalem introducing fast-recovery for branch misses: snapshot the RAT on branches so you can restore to there when you detect a branch miss, instead of draining earlier un-executed uops before starting recovery.
Also mov-elimination and xor-zeroing elimination: they're handled at register-rename time instead of needing a back-end uop to write the register. (For xor-zeroing, presumably there's a physical zero register and zeroing idioms point the architectural register at that physical zero. What is the best way to set a register to zero in x86 assembly: xor, mov or and? and Can x86's MOV really be "free"? Why can't I reproduce this at all?)
If you're going to do OoO exec at all, you might as well go all-in, so AFAIK nothing modern does just scoreboarding instead of register renaming. (Except for in-order cores that scoreboard loads, so cache-miss latency doesn't stall until a later instruction actually reads the load's target register.)
There are still in-order execution cores that don't do either, leaving instruction scheduling / software-pipelining up to compilers / humans. aka statically scheduled. This is not rare; widely used budget smartphone chips use cores like ARM Cortex-A53. Most programs bottleneck on memory, and you can allow some memory-level parallelism in an in-order core, especially with a store buffer.
Sometimes energy per computation is more important than performance.

Tomasulo's algorithm dates back to 1967. It's quite old and several modifications and improvements have been made to it. Also, new dynamic scheduling methods have been developed.
Check out http://adusan.blogspot.com.au/2010/11/differences-between-tomasulos-algorithm.html
Likewise, pure Scoreboarding is not used anymore, at least not in mainstream architectures, but its core concept is used as a base element for modern dynamic scheduling techniques.
It is fair to say that although they're not used as is anymore, some of their features are still maintained in modern dynamic scheduling and out-of-order execution techniques.

what are the the options for real time operating system for ARM cortex architechture? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am looking for RTOS for Arm M/R series (developing in C++) ?
Can someone recommend on good RTOS for ARM Cortex-M or R series?
Thank you.

Getting an answer of any value would require someone to have objectively evaluated all of them, and that is unlikely.
Popularity and suitability are not necessarily the same thing. You should select the RTOS that has the features your application needs, works with your development tools, and for which the licensing model and costs that meet your needs and budget.
The tool-chain you use is a definite consideration - kernel aware debugging and start-up projects are all helpful in successful development. Some debugger/RTOS combinations may even allow thread-level breakpoints and debugging.
Keil's MDK-ARM includes a simple RTOS with priority based pre-emptive scheduling and inter-process communication as well as a selection of middleware such as a file system, and TCP/IP, CAN and USB stacks included at no extra cost (unless you want the source code).
IAR offer integrations with a number of RTOS products for use with EWB. Their ARM EWB page lists the RTOSes with built-in and vendor plug-in support.
Personally I have used Keil RTX but switched to Segger embOS because at the time RTX was not as mature on Cortex-M and caused me a few problems. Measured context switch times for RTX were however faster than embOS. It is worth noting that IAR's EWB integrates with embOS so that would probably be the simpler route if you have not already invested in a tool-chain. I have also evaluated FreeRTOS (identical to OpenRTOS but with different licensing and support models) on Cortex M, but found its API to be a little less sophisticated and complete than embOS, and with significantly slower context switch times.
embOS has similar middleware support to RTX, but at additional cost. However I managed to hook in an alternative open source file system and processor vendor supplied USB stack without any problems in both embOS and RTX, so middleware support may not be critical in all cases.
Other options are Micro C/OS-II. It has middleware support again at additional cost. Its scheduler is a little more primitive than most others, requiring that every thread have a distinct priority level, and this does not support round-robin/time-slice scheduling which is often useful for non-realtime background tasks. It is popular largely through the associated book that describes the kernel implementation in detail. The newer Micro C/OS-III overcomes the scheduler limitations.
To the other extreme eCos is a complete RTOS solution with high-end features that make it suitable for many applications where you might choose say Linux but need real-time support and a small footprint.
The point is that you can probably take it as read that an RTOS supports pre-emptive scheduling and IPC, and has a reasonable performance level (although I mentioned varying context switch times, the range was between 5 and 15 us at 72MHz on an STM32F1xx). So I would look at things like maturity (how long as the RTOS been available for your target -you might even look at release notes to see how quickly it reached maturity and what problems there may have been), tool integration, whether the API suits your needs an intended software architecture, middleware support wither from the vendor or third-parties and licensing (can you afford it, and can you legally deploy it in the manner you intend?).
With respect to using C++, most RTOSes present a C API (even eCos which is written in C++). This is not really a problem since C code is interoperable with C++ at the binary level, however you can usefully leverage the power of C++ to make the RTOS choice less critical. What I have done is define a C++ RTOS class library that presents a generic API providing the facilities that need; for example I have classes such as cTask, cMutex, cInterrupt, cTimer, cSemaphore etc. The application code is written to this API, and the class library implemented for any number of RTOSes. This way the application code can be ported with little or no change to a number of targets and RTOSes because the class library acts as an abstraction layer. I have successfully implemented this class library for Windriver VxWorks, Segger embOS, Keil RTX, and even Linux and Windows for simulation and prototyping.
Some vendors do provide C++ wrappers to their RTOS such as Accelerated Technology's Neucleus C++ for Neucleus RTOS, but that does not necessarily provide the abstraction you might need to change the RTOS without changing the application code.
One thing to be aware of with C++ development in an RTOS is that most RTOS libraries are initialised in main() while C++ invokes constructors for static global objects before main() is called. It is common for some RTOS calls to be invalid before RTOS initialisation, and this can cause problems - especially as it differs between RTOSes. One solution is to modify your C runtime start-up code so that the RTOS initialisation is invoked before the static constructors, and main(), but after the basic C runtime environment is established. Another solution is simply to avoid RTOS calls in static objects.

Windows Low-Level Graphics [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm new to programming. I do know C/C++ and the basics of Win32. I am now trying to do graphics, but I want the fastest connection to the screen. I realize most are going with Opengl or DirectX. But, I don't want the overhead. I want to start from scratch and control the pixel data. I know about GDI bitmap, but I'm not sure if this is best access to the data. I know that I have to talk through windows, which is the trouble. Do Opengl and DirectX compile down to the level of GDI or is there a special way they do it, do they bypass or use similar code? Please, Don't ask why I want to do this. Maybe an explanation of how this is done might help. Like how windows combines all windows to create the final image.

The most direct access to pixel data is via shaders, which are supported by both OpenGL and Direct3D. They are cross-compiled and run directly on the video card. They do not use OpenGL, they do not have OpenGL overhead. OpenGL is just used to get them to the graphics card's own processor in the first place.
Anything you do on the CPU has to first be copied across the bus (typically PCI-express) to the video card. GDI is actually many levels removed from the graphics memory.
OpenGL, Direct3D, Direct2D, GDI, and GDI+ are all abstraction layers. The GPU vendor writes a driver that accepts these standard command functions, re-encodes the data in the card-specific format, then sends it to the card. Typically OpenGL and Direct3D are the most heavily optimized and also require the least amount of re-encoding.
How Windows combines the various on-screen windows to create the full-screen image depends heavily on what version of Windows you are talking about. DWM changed everything. Since DWM was introduced in Vista, programs render to their personal areas of GPU memory, then the window manager uses the texture lookup units of the video card to efficiently layer each of the programs' individual areas onto the screen primary buffer. When a program (usually a game) requests full-screen exclusive access, this step is skipped and the driver causes rendering commands from that application to affect the primary screen buffer directly.
Assuming that the CPU is generating the data which needs to be displayed, the fastest and most efficient approach is likely to be block-copying that data into a vertex buffer object and using OpenGL commands to rasterize it as lines or polygons or whatever (or the Direct3D equivalent). If you previously thought that GDI was the low-level interface, you've got some reading ahead of you to make this work. But it will run several orders of magnitude faster than pure GDI. So much faster, in fact, that the new architecture is that GDI (and WPF) is built on top of Direct2D and/or Direct3D.

but I want the fastest connection to the screen
I want to start from scratch and control the pixel data
You're asking for the impossible. You get best performance when you use GPU-accelerated functions. However, in this case you don't get direct access to pixel data, and trying to access it (read it back or write) will negatively impact the performance, because you'll have to transfer data from system memory to video memory. As a result anything that is being streamed from system memory to video memory should be handled with care. Plus you'll have to study API.
If you "start from scratch" and do rendering on CPU, you'll get easy access to pixel data and full control over the rendering, but performance will be inferior to GPU (CPU is less suitable for parallel processing, and system memory can be slower by order of magnitude than video memory), plus you'll spend significant amount of time reinventing the wheel.
Do Opengl and DirectX compile down to the level of GDI or is there a special way they do it, do they bypass or use similar code?
No. They communicate with graphic hardware nearly directly using drivers provided by hardware manufacturer. And those "direct hardware access" interfaces used by DirectX/OpenGL won't be available to you - they're hardware-specific and manufacturer specific, can be internal and possibly even protected by patents.
There are, of course, few legacy hardware interfaces which ARE available to you (namely VESA or VGA 13h mode), however, their direct use is normally forbidden by operating system (you can't easily access VESA on windows), so to access them you'll have to either boot MS-DOS, use custom operating system, or helper classes (such as SVGAlib on linux) which might only function under root privilegies. And of course, even if you actually use VESA/VGA to render something yourself, on any hardware (newer than RivaTNT 2 Pro) performance will be horrible compared to hardware-accelerated rendering done by OpenGL/DirectX. Have you ever seen how fast windows xp works when it doesn't have proper GPU driver (takes a second to redraw window)? that's how fast it is going to work with direct VESA/VGA access.
Please, Don't ask why I want to do this.
It makes sense to ask why you would want to do that. Your "I want direct low-level access" approach was suitable maybe 15..20 years ago or in DOS era. Right now reasonable solution would be to use existing API (that is maintained by somebody who isn't you) and search for a way to fully utilize it. Of course, if you wanted to develop drivers, that would be another story.

Do Opengl and DirectX compile down to the level of GDI or is there a special way they do it
and
I realize most are going with Opengl or DirectX. But, I don't want the overhead
So what you're saying is **you have absolutely no clue what OpenGL or DirectX actually do, and yet you've decided that they are not efficient enough for your needs.
I'm sorry, but this is nonsense. It is impossible to answer a question like that.
In the real world, you have a small supercomputer dedicated to doing graphics. And you get access to it through OpenGL and DirectX.
And the reason they are fast is that they do NOT just "start from scratch and control the pixel data".
So please, if you want serious answers, try letting those with the knowledge to answer your questions decide which question is best.
The correct answer, if you want efficient graphics, is to use DirectX or OpenGL.

Reasons not to develop Apps with a jailbreak iPhone as the only testing device [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I'm looking for reasons developers ought to consider before developing and testing applications and games on a jailbreak device. My conviction is that if you want to publish your App to the App Store, you better make sure you always test on a non-jailbreak device. Eg. if you are a serious developer, what do you have to consider before jailbreaking your only development device respectively buying a second untampered device just for development.
The legal implications are fairly well known but it doesn't hurt to reiterate them. What I'm more interested in are all the technical reasons why development on a jailbroken iPhone will make your life harder (or sometimes easier if that exists, too).
For example, I've read that jailbreak devices can cause adverse behavior, bugs and crashes which will not appear on a non-jailbreak device. But what those issues are remains in the dark. I'm looking for concrete evidence of bugs and misbehavior that is relatively common (eg occured to you, or someone who blogged about it) when you do test on a jailbreak device.

You always want to test on the device and configuration you will be releasing for. If your going to release for jailbroke devices, test on jailbroke. If you intend to release through the app store, test on an unbroken devices.
I haven't work on a jailbroke device but I think the biggest issue would be that jailbroke devices do not enforce the same security restrictions as non-jailbroke devices. It would be easy to have a segment of code that relies on access that disappears on a non-jailbroke device.
You always want to develop and/or test on device as close to your projected average user as possible. One big mistake that developers have made for decades is building and test new software on their high-horse power developer stations. They see that the softwares runs fine on there above average systesm but when they release the software to average users running on average systems, the software is to slow in real-world use. That wouldn't be such an issue on mobile devices but the principle is the same.

there is one big problem that I've took more than a week to figure out:
inAppPurchase development doesn't work on JB devices (it gives InvalidProductIDs for all inApps)
(some reports say it's for JB with AppSync installed)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas