GCC vs Greenhills on ARM

GCC vs Greenhills on ARM - optimization

I'm interested in any comparisons between GCC and Greenhills C compiler with regard to memory footprint of generated code specifically on ARM platforms.
Are there any benchmarks or comparisons for these compilers? Has anyone had any experience here that they'd like to share?

You should note that the Green Hills EULA explicitly prohibits licensees from publishing benchmarks.
What you can do is obtain an evaluation licence from Green Hills and perform your own benchmarking. That would be more trustworthy and representative in any case since you could test it on real production code. And in any case the benchmark for say an ARM7 may be very different to that of a Cortex-M3 for example, so any available published results may not be comparing like-for-like, and may not be representative of your platform.
Beware also that I have experienced widely varying results from different binary distributions of GCC even when ostensibly from the same code base version (specifically with software-floating-point performance. So you are still probably best off trusting your own evaluation results only.
You might consider Keil and IAR at the same time which also have evaluation versions. Why are you considering just these two? People generally go with Green Hills when they have big budgets and can benefit from the RTOS integration and debugger capabilities available from a single source; any benefit you might get from using the compiler alone is unlikely to justify the license costs IMO.

I have not seen any benchmarks but from my experience the two compilers is very similar code size and the code generated.
Green Hills has lots of documentation and support if you want to reduce your memory foot print, with GCC it get lonely very fast once your off the beaten track. Green Hills also support compressed executable images that is great if you have limited FLASH but plenty RAM.
I have also used custom runtime and C libraries (This can save you some more space) with both compilers but you will need to do some digging to get info for GCC but the Green hills you can get some of the stuff via a wizard that generates the build file.

Related

Valgrind vs. Linux perf correlation

Suppose that I choose perf events instructions, LLC-load-misses, LLC-store-misses. Suppose further that I test a program prog varying its input. Is valgrind supposed to give me the "same" functional results for the same input and the same counter? That is, if one value in perf goes up, the one in valgrind should always do the same? Is there any impact in valgrind being a simulation that I should be aware of during profiling my code?
EDIT: BTW, before people grill me for not experimenting myself, I have to say that I (kinda) have, the problem is that I have a Sandybridge processor, and perf has a "bug" that prevents me from measuring LLC-* events. There is a patch, but I don't feel like recompiling my kernel...

Well, Cachegrind is a cache simulator. Even though it tries to mimic some of your hardware's characteristics (cache size, associativity, etc), it does not model every single feature and behavior of your system. Therefore you might in some cases see some differences.
For example, Valgrind's doc states that "Cachegrind simulates branch predictors intended to be typical of mainstream desktop/server processors of around 2004". Sandy Bridge processors first appeared in 2011, and you can guess that branch predictors have improved quite a lot since 2004.
That being said, Valgrind is still a wonderful tool to have in your toolbox.
What's the problem with perf's LLC events on Sandy Bridge processors? I use these events everyday at work on my Sandy Bridge laptop and it works as expected (archlinux 64bits, linux 3.6).

Are there modern compilers for high level languages on simple processors which produce self-modifying code?

Back in the days before caches and branch prediction, it was relatively common if not encouraged to make self-modifying code for certain kinds of optimizations. It was probably most common in games and demos written in assembler in the eras between 8-bit up to early 32-bit such as the Amiga.
I'm not sure if any compilers from those days emitted self-modifying assembler or machine code.
What I'm wondering is whether there are any current/modern compilers that do. Obviously it would be useless or too difficult on powerful processors with caches.
But What about the very many simple processors such as used in embedded systems? Is self modifying code considered a viable optimization strategy by any modern compilers for any simple / 8-bit / embedded processor?
There is a question with a similar title, "Is there any self-improving compiler around?
", but note that it's not about the same subject:
Please note that I am talking about a compiler that improves itself - not a compiler that improves the code it compiles.

All embedded systems today use flash ROM. I believe Amiga and similar were RAM-based systems. The only way "self-modification" exists in embedded systems, is where you have boot loaders, that re-program certain parts of the flash depending on what program and/or functionality that should be used.
Apart from that, it doesn't make sense for the program to modify itself in runtime. Running code from RAM is generally discouraged, for safety reasons (potential of accidental modifications caused by bugs) and for electric reasons (RAM is volatile and far more sensitive to EMC than flash).

Advantages of atmega32 [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are the advantages of using ATmega32 than other microcontrollers?
Is it better than PIC, ARM, and 8051?

Advantages
Still runs on 5 V, so legacy 5 V stuff interfaces cleaner
Even though it's 5 V capable, newer parts can run to 1.8 V. This wide range is very rare.
Nice instruction set, very good instruction throughput compared to other processors (HCS08, PIC12/16/18).
High quality GCC port (no proprietary crappy compilers!)
"PA" variants have good sleep mode capabilities, in micro-amperes.
Well rounded peripheral set
QTouch capability
Disadvantages
Still 8-bit. An ARM is a 16/32-bit workhorse, and will push a good amount more data around, at much higher clock speeds, than any 8-bit.
Cost. Can be expensive compared to HCS08 or other bargain 8-bit processors.
GCC toolchain has quirks, like the split memory model and limited 16-bit pointers.
Atmel is not the best supplier on the planet (at least they're not Maxim...)
In short, they are a very clean and easy to work with a 8-bit microcontroller.
An 8051 is legacy: the tools are passable, the architecture is bizarre (idata? xdata? non-reentrant functions in most compilers by default?).
PIC before PIC24 is also bizarre (register banking) and poor clock->instruction throughput. There is no first-class open source C compiler either.
PIC32 is competing with ARM7TDMI and ARM Cortex-M3, based on an adapted MIPS core, and has a GCC port (not main-lined).
AVR32 is competing with Cortex-M3, and offers a pretty good value, especially in the low power area.
MSP430 is the king for ultra low-power devices, and has a passable GCC port (if you're not targeting 430X).
HCS08 is very inexpensive, but poor instruction throughput. Peripherals vary quite a bit.
ARM used to be a higher cost entry point, but with the introduction of the Cortex-M3 architecture, the price has been dropping compared to an 8-bit. For example, the LPC13xx series is comparable to a ATmega32 in many ways. Luminary (TI) has quite an impressive peripheral set.

It depends.. Firstly you have to know what you want from the microprocessor.
In general:
PIC:
Old architecture. This means it's either expensive or slow
Targets only low end market (< few Mhz)
There's a lot of code written for it
ARM
Scalable
Fast/cheap
Atmega is somewhere in between

I find the PIC family (before the MIPS version) to have the most painful instruction set of all, which means assembler is the language of choice if you want to conserve space, get performance, have control, etc.
The 8051 is a little less painful, more registers, but still takes a handful of instructions to do anything useful (meaning you cannot compare these to other chips from a MHz perspective). I like AVR in many ways, they embrace the homebrew and developer community, or if not directly there is a much better family of developers out there compared to the competition. I don't like the instruction set, but it is decades ahead of the PIC and 8051. I like the MSP430 instruction set quite a bit, it is one of the best instruction sets for teaching assembler, TI is not as developer friendly though, it can be a struggle. The eZ430 was on the right path but the goodfet is better as you don't have it failing to work every other kernel version.
MSP430 and ARM have the best instructions sets as far as I am concerned which leads to both good assembler and good compiler tools. You can find commercial tools for all of the above and certainly for 8051, MSP430, and ARM free tools (MSP430 and ARM can use GCC, 8051 cannot, look for SDCC). For now mspgcc4.sf.net and CodeSourcery are the place for GCC based tools for MSP430 and ARM. LLVM supports both, I was able to get LLVM 27 to beat the latest GCC in a dhrystone test, but that is one test, LLVM trails behind in performance but is improving.
As far as finding and creating free cross compilers I see LLVM already as the easiest to get and use and going forward it will hopefully only get better. Sadly the MSP430 port for LLVM was a look what I could do in an afternoon PowerPoint presentation and not a serious port.
My answer is that it depends on what you are doing, and I recommend you try all of them. These days the evaluation boards are in the sub US$50 range and some in the sub US$30 range. Even within the ARM family (ST, Atmel, Stellaris, LPC, etc.) there is a wide veriety of features and quirks that you will only find if you try them. Avoid the LPCexpresso, mbed2, and STM32 primers. Avoid LPC in general and avoid Cortex-M3 in general until you have cut your teeth on an ARM7. Look at SparkFun for Olimex and other boards. Although it is probably LPC the ARMmite PRO and Arduino Pro are good choices. The eZ430 is a good MSP430 start, and I don't remember who is making 8051 stuff, Renasys (sp?), 8051s are not all created equal, the register space varies from one to another and you have to prepare for that. I would probably look for an 8051 simulator if you want to play with the 8051.
I see AVT and definitely ARM continuing to dominate, I would like to see the MSP430 be used for things other than just super low power. With ARM, AVR, and MSP430 you can use and get used to GCC tools now and in the future, which has a lot of benefits even if GCC isn't the best compiler in the world, it is by far the best supported compiler. I would avoid proprietary compilers and tools. I would look for devices that have non-proprietary programming interfaces that are field programmable, JTAG is good, but for example the new SWD JTAG on Cortex-M3 is bad. TI MSP was hurt by this but some hacking has resolved this, at least for now. I really don't have much good to say about PIC and won't try. A big thing to look for is glue logic, does the part or family have the SPI or I2C or whatever bus you want to use, do you need an internal pull up or wired or input?
Some chips just don't have that option and you have to add external hardware. Do you need an interrupt, with conditioning? ARM tends to win on this because it is a core used by many so each ARM vendor puts its own I/O out there so you can still live in the ARM world and have many choices, AVR and MSP are going to be very limited by comparison. With ARM the tools are going to be state of the art, ARM is the most used processor right now. AVR and MSP are special project addons, less widely supported and fragile. Although ARM is low power compared to Intel on a SBC or computer platform, it is likely not as low-power as an AVR or MSP. You really need to look at your project and pick the right processor for the job, I wouldn't and don't limit myself to one family. With as cheap as the evaluation boards are, and almost all can use free tools, it is just a matter of putting in a few nights or weekends in to learn each. I suggest learning more than one AVR, and learning more than one microprocessor.

At this end of the spectrum, there are only really two factors that make much difference. First, in smaller quantities, the only thing that matters at all is which architecture suits your development needs best. If you are already familiar with PIC, there's not much point in learning avr, or visa versa. Pick an architecture that you like, then sort through the options on that architecture to see which model is up to your particular needs.
In quantity (say, 20 or more units), you might benefit by choosing just the right platform that precisely matches your devices' needs, to keep costs as low as possible.
In general, Pic and avr platforms are good for simple, single function devices, where as arm is used in cases where you need a full OS stack like QNX or Linux for things like TCP or real time with OS services.

If you want the widest choice of peripherals, performance, price-point, software and tool support, and suppliers it would be hard to beat an ARM Cortex-M3 based part.
But addressing your question directly the whole AVR range has a consistent architecture and common peripheral set from the Tiny to the Mega (not AVR32 however which is entirely different). This is the significant difference with PIC where when moving up the range( PIC10, 12, 16, 18, 24, 32), you get different peripheral designs, different instruction sets, and need to invest in different compilers and debug hardware.
The instruction set for AVR was designed for efficient C code compilation (again unlike PIC).
8051 is an architecture originally introduced by Intel decades ago, but now used as the core for 8 bit devices from a number of vendors. It has some clever tricks such as efficient multitasking context switches via its 8 duplicated register banks, and a block of bit addressable memory, but has a quirky memory architecture and limited address range (like most 8 bit devices). Great for small well targeted devices, but not truly general purpose.
ARM Cortex-M3 essentially replaces ARM7TDMI, and is a cleaner design with well thought out architecture. It requires minimal assembler start-up code and even ISRs and vector tables can be coded in C directly without any weird compiler extensions or assembler entry/exit code. Its bitbanding technique allows all memory and peripherals to be atomically bit addressable, which is useful for fast I/O and safe multithreading. Basically it is designed to allow C or C++ code at the system level without non-standard compiler extensions. It is of course a 32 bit architecture, so does not have the resource or arithmetic limitations of 8 bit devices. Prices for low-end parts compete with higher performance 8 bit devices, and blow most 16 bit devices out of the water (making 16 bit almost obsolete).
One other key thing to remember is that PIC and AVR are from single vendors, while 8051 and ARM are licensed cores. Each licensee adds their own peripheral set, so there is no commonality between vendors on peripherals, so device driver code needs porting when switching vendors, and you need to ensure the part has the peripherals you need. If you design your device layer well, this is seldom much of a problem.

Well, it isn't easy to answer. It mostly depends on what you used before. If you are already AVR user, then it's good to use. On the other hand you can find PICs with similar capabilities, so I'd say it's mostly personal preference. I think that most ARMs are more capable than atmega32 series. If you want good advice, tell us what you plan to use it for.
AVRs have flat memory model and have free development tools and cheap development hardware is available for them.
I don't know enough about 8051 to comment.
Oh and if you're thinking about original atmega32, I'd say it's a bad idea. It's going to be deprecated soon, so you may want to consider newer models from the atmega32 series.

Optimisation , Compilers and Its Effects

(i) If a Program is optimised for one CPU class (e.g. Multi-Core Core i7)
by compiling the Code on the same , then will its performance
be at sub-optimal level on other CPUs from older generations (e.g. Pentium 4)
... Optimizing may prove harmful for performance on other CPUs..?
(ii)For optimization, compilers may use x86 extensions (like SSE 4) which are
not available in older CPUs.... so ,Is there a fall-back to some non-extensions
based routine on older CPUs..?
(iii)Is Intel C++ Compiler is more optimizing than Visual C++ Compiler or GCC..
(iv) Will a truly Multi-Core Threaded application will perform effeciently on a
older CPUs (like Pentium III or 4)..?

Compiling on a platform does not mean optimizing for this platform. (maybe it's just bad wording in your question.)
In all compilers I've used, optimizing for platform X does not affect the instruction set, only how it is used, e.g. optimizing for i7 does not enable SSE2 instructions.
Also, optimizers in most cases avoid "pessimizing" non-optimized platforms, e.g. when optimizing for i7, typically a small improvement on i7 will not not be chosen if it means a major hit for another common platform.
It also depends in the performance differences in the instruction sets - my impression is that they've become much less in the last decade (but I haven't delved to deep lately - might be wrong for the latest generations). Also consider that optimizations make a notable difference only in few places.
To illustrate possible options for an optimizer, consider the following methods to implement a switch statement:
sequence if (x==c) goto label
range check and jump table
binary search
combination of the above
the "best" algorithm depends on the relative cost of comparisons, jumps by fixed offsets and jumps to an address read from memory. They don't differ much on modern platforms, but even small differences can create a preference for one or other implementation.

It is probably true that optimising code for execution on CPU X will make that code less optimal on CPU Y than the same code optimised for execution on CPU Y. Probably.
Probably not.
Impossible to generalise. You have to test your code and come to your own conclusions.
Probably not.
For every argument about why X should be faster than Y under some set of conditions (choice of compiler, choice of CPU, choice of optimisation flags for compilation) some clever SOer will find a counter-argument, for every example a counter-example. When the rubber meets the road the only recourse you have is to test and measure. If you want to know whether compiler X is 'better' than compiler Y first define what you mean by better, then run a lot of experiments, then analyse the results.

I) If you did not tell the compiler which CPU type to favor, the odds are that it will be slightly sub-optimal on all CPUs. On the other hand, if you let the compiler know to optimize for your specific type of CPU, then it can definitely be sub-optimal on other CPU types.
II) No (for Intel and MS at least). If you tell the compiler to compile with SSE4, it will feel safe using SSE4 anywhere in the code without testing. It becomes your responsibility to ensure that your platform is capable of executing SSE4 instructions, otherwise your program will crash. You might want to compile two libraries and load the proper one. An alternative to compiling for SSE4 (or any other instruction set) is to use intrinsics, these will check internally for the best performing set of instructions (at the cost of a slight overhead). Note that I am not talking about instruction instrinsics here (those are specific to an instruction set), but intrinsic functions.
III) That is a whole other discussion in itself. It changes with every version, and may be different for different programs. So the only solution here is to test. Just a note though; Intel compilers are known not to compile well for running on anything other than Intel (e.g.: intrinsic functions may not recognize the instruction set of a AMD or Via CPU).
IV) If we ignore the on-die efficiencies of newer CPUs and the obvious architecture differences, then yes it may perform as well on older CPU. Multi-Core processing is not dependent per se on the CPU type. But the performance is VERY dependent on the machine architecture (e.g.: memory bandwidth, NUMA, chip-to-chip bus), and differences in the Multi-Core communication (e.g.: cache coherency, bus locking mechanism, shared cache). All this makes it impossible to compare newer and older CPU efficiencies in MP, but that is not what you are asking I believe. So on the whole, a MP program made for newer CPUs, should not be using less efficiently the MP aspects of older CPUs. Or in other words, just tweaking the MP aspects of a program specifically for an older CPU will not do much. Obviously you could rewrite your algorithm to more efficiently use a specific CPU (e.g.: A shared cache may permit you to use an algorithm that exchanges more data between working threads, but this algo will die on a system with no shared cache, full bus lock and low memory latency/bandwidth), but it involves a lot more than just MP related tweaks.

(1) Not only is it possible but it has been documented on pretty much every generation of x86 processor. Go back to the 8088 and work your way forward, every generation. Clock for clock the newer processor was slower for the current mainstream applications and operating systems (including Linux). The 32 to 64 bit transition is not helping, more cores and less clock speed is making it even worse. And this is true going backward as well for the same reason.
(2) Bank on your binaries failing or crashing. Sometimes you get lucky, most of the time you dont. There are new instructions yes, and to support them would probably mean trap for an undefined instruction and have a software emulation of that instruction which would be horribly slow and the lack of demand for it means it is probably not well done or just not there. Optimization can use new instructions but more than that the bulk of the optimization that I am guessing you are talking about has to do with reordering the instructions so that the various pipelines do not stall. So you arrange them to be fast on one generation processor they will be slower on another because in the x86 family the cores change too much. AMD had a good run there for a while as they would make the same code just run faster instead of trying to invent new processors that eventually would be faster when the software caught up. No longer true both amd and intel are struggling to just keep chips running without crashing.
(3) Generally, yes. For example gcc is a horrible compiler, one size fits all fits no one well, it can never and will never be any good at optimizing. For example gcc 4.x code is slower on gcc 3.x code for the same processor (yes all of this is subjective, it all depends on the specific application being compiled). The in house compilers I have used were leaps and bounds ahead of the cheap or free ones (I am not limiting myself to x86 here). Are they worth the price though? That is the question.
In general because of the horrible new programming languages and gobs of memory, storage, layers of caching, software engineering skills are at an all time low. Which means the pool of engineers capable of making a good compiler much less a good optimizing compiler decreases with time, this has been going on for at least 10 years. So even the in house compilers are degrading with time, or they just have their employees to work on and contribute to the open source tools instead having an in house tool. Also the tools the hardware engineers use are degrading for the same reason, so we now have processors that we hope to just run without crashing and not so much try to optimize for. There are so many bugs and chip variations that most of the compiler work is avoiding the bugs. Bottom line, gcc has singlehandedly destroyed the compiler world.
(4) See (2) above. Don't bank on it. Your operating system that you want to run this on will likely not install on the older processor anyway, saving you the pain. For the same reason that the binaries optimized for your pentium III ran slower on your Pentium 4 and vice versa. Code written to work well on multi core processors will run slower on single core processors than if you had optimized the same application for a single core processor.
The root of the problem is the x86 instruction set is dreadful. So many far superior instructions sets have come along that do not require hardware tricks to make them faster every generation. But the wintel machine created two monopolies and the others couldnt penetrate the market. My friends keep reminding me that these x86 machines are microcoded such that you really dont see the instruction set inside. Which angers me even more that the horrible isa is just an interpretation layer. It is kinda like using Java. The problems you have outlined in your questions will continue so long as intel stays on top, if the replacement does not become the monopoly then we will be stuck forever in the Java model where you are one side or the other of a common denominator, either you emulate the common platform on your specific hardware, or you are writing apps and compiling to the common platform.

Does it matter which microcontroller to use for 1st time embed system programmer?

I've experience in doing desktop and web programming for a few years. I would like to move onto doing some embed system programming. After asking the initial question, I wonder which hardware / software IDE should I start on...
Arduino + Arduino IDE?
Atmel AVR + AVR Studio 4?
Freescale HCS12 or Coldfire + CodeWarrior?
Microchip PIC+ MPLAB?
ARM Cortex-M3 + ARM RealView / WinARM
Or... doesn't matter?
Which development platform is the easiest to learn and program in (take in consideration of IDE usability)?
Which one is the easiest to debug if something goes wrong?
My goal is to learn about "how IO ports work, memory limitations/requirements incl. possibly paging, interrupt service routines." Is it better to learn one that I'll use later on, or the high level concept should carry over to most micro-controllers?
Thanks!
update: how is this dev kit for a start? Comment? suggestion?

Personally, I'd recommend an ARM Cortex-M3 based microcontroller. The higher-power ARM cores are extremely popular, and these low-power versions could very well take off in a space that is still littered with proprietary 8/16-bit cores. Here is a recent article on the subject: The ARM Cortex-M3 and the convergence of the MCU market.
The Arduino is very popular for hobbyist. Atmel's peripheral library is fairly common across processor types. So, it would smooth a later transition from an AVR to an ARM.
I don't mean to claim that an ARM is better than an AVR or any other core. Choosing an MCU for a commercial product usually comes down to peripherals and price, followed by existing code base and development tools. Besides, microcontrollers are general much much simpler than a desktop PC. So, it's really not that hard to move form one to another after you get the hang of it.
Also, look into FreeRTOS if you are interested in real-time operating system (RTOS) development. It's open source and contains a nice walk through of what an RTOS is and how they have implemented one. In fact, their walk-through example even targets an AVR.
Development tools for embedded systems can be very expensive. However, there are often open source alternatives for the more open cores like ARM and AVR. For example, see the WinARM and WinAVR projects.
Those tool-chains are based on GCC and are thus also available (and easier to use IMHO) on non-Windows platforms. If you are familiar with using GCC, then you know that there are an abundance of "IDE's" to suit your taste from EMACS and vi (my favorite) to Eclipse.
The commercial offerings can save you a lot of headaches getting setup. However, the choice of one will very much depend on your target hardware and budget. Also, Some hardware support direct USB debugging while others may require a pricey JTAG adapter.
Other Links:
Selection Guide of Low Cost Tools for Cortex-M3
Low-Cost Cortex-M3 Boards:
BlueBoard-LPC1768-H ($32.78)
ET-STM32 Stamp Module ($24.90)
New Arduino to utilize an ARM Cortex-M3 instead of an AVR microcontroller.

Given that you already have programming experience, you might want to consider getting an Arduino and wiping out the firmware to do your own stuff with AVR Studio + WinAVR. The Arduino gives you a good starting point in understanding the electronics side of it. Taking out the Arduino bootloader would give you better access to the Atmel's innards.
To get at the goals you're setting out, I would also recommend exploring desktop computers more deeply through x86 programming. You might build an x86 operating system kernel, for instance.

ARM is the most widely used embedded architecture and covers an enormous range of devices from multiple vendors and a wide range of costs. That said there are significant differences between ARM7, 9, 11, and Cortex devices - especially Cortex. However if getting into embedded systems professionally is your aim, ARM experience will serve you well.
8 bit architectures are generally easier to use, but often very limited in both memory capacity and core speeds. Also because they are simple to use, 8-bit skills are relatively easy to acquire, so it is a less attractive skill for a potential employer because it is easy to fulfil internally or with less experienced (and therefore less expensive) staff.
However if this is a hobby rather than a career, the low cost of parts, boards, and tools, and ease of use may make 8 bit attractive. I would suggest AVR simply because it is supported by the free avr-gcc toolchain. Some 8 bit targets are supported by SDCC, another open source C compiler. I believe Zilog make their Z8 compiler available for free, but you may need to pay for the debug hardware (although this is relatively inexpensive). Many commercial tool vendors provide code-size-limited versions of their tools for evaluation and non-commercial use, but beware most debuggers require specialist hardware which may be expensive, although in some cases you can build it yourself if you only need basic functionality and low speeds.
Whatever you do do take a look at www.embedded.com. If you choose ARM, I have used WinARM successfully on commercial projects, although it is not built-for-comfort! A good list of ARM resources is available here. For AVR definitely check out www.avrfreaks.net
I would only recommend Microchip PIC parts (at least the low-end ones) for highly cost sensitive projects where the peripheral mix is a good fit to the application; not for learning embedded systems. PIC is more of a branding than an architecture, the various ranges PIC12, 16, 18, 24, and PIC32 are very different from each other, so learning on one does not necessarily stand you in good stead for using another - often you even need to purchase new tools! That said, the dsPIC which is based on the PIC24 architecture may be a good choice if you wanted to get some simple DSP experience at the same time.
In all cases check out compiler availability (especially if C++ support is a requirement) and cost, and debugger hardware requirements, since often these will be the most expensive parts of your dev-kit, the boards and parts are often the least expensive part.

This is kind of a hard question to answer as your ideal answer very much depends on what it is your interested in learning.
If your goal is just to dive a little deeper into the inner workings of computing systems i would almost recommend you forgo the embedded route and pick up a book on writing a linux kernel module. Write something simple that reads a temperature sensor off the SMbus or something like that.
If your looking at getting into high level (phones, etc) embedded application development, download the Android SDK, you can code in java under eclipse and even has a nice emulator.
If your looking at getting into the "real" microcontroller space and really taking a look at low level system programming, i would recommend you start on a very simple architecture such as an AVR or PIC, something without an MMU.
Diving into the middle ground, for example an ARM with MMU and some sort of OS be it linux or otherwise is going to be a bit of a shock as without a background is both system programming and hardware interfacing i think the transition will be very rough if you plan to do much other than write very simple apps, counting button presses or similar.

Texas Instruments has released a very interesting development kit at a very low price: The eZ430-Chronos Development Tool contains an MSP430 with display and various sensors in a sports watch, including a usb debug programmer and a usb radio access point for 50$
There is also a wiki containing lots and lots of information.
I have already created a stackexchange proposal for the eZ430-Chronos Kit.

No it doesn't matter if you want to learn how to program an embedded device. But you need to know the flow of where to start and where to go next. Cause there are many micro-controllers out there and you don't know which one to choose. So better have a road-map before starting.
In my view you should start with - Any AVR board (atmega 328P- arduino boards or AVR boards)
then you should go to ARM micro-controller - first do ARM CORTEX TDMI
then ARM cortex M3 board.Thus this will give you an overall view after which you can choose any board depending on what kind of project you are working and what are your requirements.

Whatever you do, make sure you get a good development environment. I am not a fan of Microchip's development tools even though I like their microcontrollers (I have been burned too many times by MPLAB + ICD, too much hassle and dysfunction). TI's 2800 series DSPs are pretty good and have an Eclipse-based C++ development environment which you can get into for < US$100 (get one of the "controlCARD"-based experimenter's kits like the one for the 28335) -- the debugger communications link is really solid; the IDE is good although I do occasionally crash it.
Somewhere out there are ICs and boards that are better; I'm not that familiar with the embedded microcontroller landscape, but I don't have much patience for poor IDEs with yet another software tool chain that I have to figure out how to get around all the bugs.

Some recommend the ARM. I'd recommend it, not as a first platform to learn, but as a second platform. ARM is a bit complex as a platform to learn the low-level details of embedded, because its start-up code and initialisation requirements are more complicated than many other micros. But ARM is a big player in the embedded market, so well worth learning. So I'd recommend it as a second platform to learn.
The Atmel AVR would be good for learning many embedded essentials, for 3 main reasons:
Architecture is reasonably straight-forward
Good development kits available, with tutorials
Fan forum with many resources
Other micros with development kits could also be good—such as MSP430—although they may not have such a fan forum. Using a development kit is a good way to go, since they are geared towards quickly getting up-and-running with the micro, and foster effective learning. They are likely to have tutorials oriented towards quickly getting started.
Well, I suppose the development kits and their tutorials are likely to gloss over such things as bootloaders and start-up code, in favour of getting your code to blink the LED as soon as possible. But that could be a good way to get started, and you can explore the chain of events from "power-on" to "code running" at your pace.
I'm no fan of the PICs, at least the PIC16s, due to their architecture. It's not very C-friendly. And memory banks are painful.

It does matter, you need to gradually acquire experience starting with simpler systems. Note that by simpler I dont mean less powerful, I mean ease of use, ease of setup etc. In that vein I would recommend the following (I have no vested interest in a any of the products, I just found them the best):
I've started using one of these (MBED developer board). The big selling points for me were that I could code in C or C++, straightforward connection vis USB and a slick on-line development environment (no local tool installation required at all!).
http://mbed.org/
Five minutes afer opening box I had a sample blinky program (the 'hello world' of the emedded world) running the following:
#include "mbed.h"
DigitalOut myled(LED1);
int main()
{
while(1)
{
myled = 1;
wait(0.2);
myled = 0;
wait(0.2);
}
}
That's it! Above is the complete program!
It's based on ARM Cortex M3, fast and plenty of memory for embedded projects (100mhz, 256k flash & 32k ram). The online dev tools have a very good library and plenty of examples and theres a very active forum. Plenty of help on connecting devices to MBED etc
Even though I have plenty of experience with embedded systems (ARM 7/9, Renases M8/16/32, Coldfire, Zilog, PIC etc) I still found this a refreshingly easy system to get to grips with while having serious capability.
After initially playing with it on a basic breadboard I bought a base board from these guys: http://www.embeddedartists.com/products/lpcxpresso/xpr_base.php?PHPSESSID=lj20urpsh9isa0c8ddcfmmn207. This has a pile of I/O devices (including a miniture OLED and a 3axis accelerometer). From the same site I also bought one of the LCPExpresso processor boards which is cheap, less power/memory than the MBED but perfect for smaller jobs (still hammers the crap out of PIC/Atmega processors). The base board supports both the LCPExpresso and the MBED. Purchasing the LCPExpress processor board also got me me an attached JTAG debugger and an offline dev envoronment (Code Red's GCC/Eclipse based dev kit). This is much more complex than the online MBED dev environment but is a logical progression after you've gained expeience with the MBED.
With reference to my original point noite that the MBED controller is much more capable than the the LPCExpresso controller BUT is much simpler to use and learn with.

I use microchips PIC's, its what I started on, I mainly got going on it due to the 123 microcontroller projects for the evil genius book. I took a Microprocessors class at school for my degree and learned a bit about interrupts and timing and things, this helped a ton with my microcontrollers. I suppose some of the other programmers etc may be better/easier, but for $36 for the PicKit1, I'm too cheap to go buy another one...and frankly without using them I don't know if they are easier/better, I like mine and recommend it every chance I get, and it took me forever to really actually look at it, but I was able to program another chip off board with ICSP finally. I don't know what other programmers do it, but for me that's the nicest thing 5 wire interface and you're programmed. Can't beat that with a stick...

I've only used one of those.
The Freescale is a fine chip. I've used HC-something chips for years for little projects. The only caveat is that I wouldn't touch CodeWarrier embedded with a 10 foot pole. You can find little free C compilers and assemblers (I don't remember the name of the last one I used) that do the job just fine. Codewarrior was big and confusing and regardless of how much I knew about the chip architecture and C programming always seemed to only make things harder. If you've used Codewarrior on the Mac back in the old days and think CW is pretty neat, well, it's not at all like that. CW embedded looks vaguely similar, but it works very differently, and not very well.
A command-line compiler is generally fine. Professionals who can shell out the big bucks get expensive development environments, and I'm sure they make things better, but without that it's still far better than writing assembly code for a desktop PC in 1990, and somehow we managed to do that just fine. :-)

You might consider a RoBoard. Now, this board may not be what you are looking for in terms of a microcontroller, but it does have the advantage of being able to run Windows or DOS and thus you could use the Microsoft .NET or even C/C++ development tools to fiddle around with things like servos or sensors or even, what the heck, build a robot! It's actually kinda fun.
There's also the Axon II, which has the ATmega640 processor.
Either way, both boards should help you achieve your goal.
Sorry for the robotics focus, just something I'm interested in and thought it may help you too.

I use PICs, but would consider Arduino if I chose today. But from your goals:
how IO ports work
memory limitations/requirements
interrupt service routines
I wonder if you best bet is just to hack in the Linux kernel?

BBC Micro Bit
https://en.wikipedia.org/wiki/Micro_Bit
This cheap little board (~20 pounds) was crated by ARM Holdings as an educational device, and 1M units were given out for free to UK students.
It contains an ARM Cortex-M0, the smallest ARM core of all.
I recommend it as a first micro-controller board due to its wide availability, low cost, simplicity, and the fact that it introduces you to the ARM architecture, which has many more advanced boards also available for more serious applications.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas