Using openOCD to extract ETB traces on Cortex-M (STM32F4) - embedded

STM32F4 discovery (Cortex-M4) has ETB, a buffer storing instruction traces. How can I use OpenOCD and on-chip st-link debugger to pull out the traces from ETB?
I am little confused between SWO/SWD. What should I be using? Also, do I need any additional hardware for extracting traces?
Thank you

I'm afraid I never heard of
STM32F4
including an Embedded Trace Buffer (ETB) in the implemented subset of the
ARM core
and its
CoreSight features.
I think this is because ETB is an optional feature, and ST has decided not to configure/implement this ETB option in its STM32F4 controllers and the ARM core they embed.
I looked up the programming/reference manuals and datasheet of an upper-level representative of STM32F4xx family, and I didn't find anything about ETB, which seems to confirm this assumption.
Now, ETB is not the only option if one wants to stream trace data out of one's MCU:
STM32F4 controllers all have an Instruction Trace Macrocell (ITM), which can alternatively deliver a software-defined char output stream or snapshots of data values or the program counter that are collected either at your breakpoints or just periodically, with assistance by the Data Watchpoint/Trace (DWT) unit.
You can use the ITM
to instrument the application with character output (printf())
to profile your application
to inspect certain properties/state flows of your software by tracing program breakpoints or data watchpoints
The ITM is usable through the SWO pin, using any adapter hardware like any version of ST-Link, j-Link, uLink-* etc.
It is a proper trace interface since it works without stopping the CPU at breakpoints, so examination won't break your system's real-time properties.
Many STM32F4 controllers (AFAIK, those with >= 100 pins) include an Embedded Trace macrocell (ETM), which is able to trace program counter (PC) and data of every CPU cycle, so you can use this one to trace the entire control flow (and data flow) of your controller, also without stopping it at any breakpoint.
The humongous amount of data to be traced (make sure you have a free USB3 port...) can only be delivered in a useful way through the synchronous port interface around the GPIOE group (alternate functions TRACECLK+TRACED0/1/2/3 => 5 pins in total), which is connected to the Trace Port Interface Unit (TPIU) next to the ETM.
In order to use this technology, you need the more expensive variants of debug adapters like j-Trace, uLink-Pro or Lauterbach. The cheapest ETM-capable adapter I'm aware of (haven't used it yet, though) is QTrace by PDQlogic starting around £379. The others are available for about 1-4 k£/k€/k$.
The way your question sounds tells me that you probably just started programming STM32's. Therefore I recommend you to get a development board with an embedded ST-Link inside. This is the cheapest solution to get (SWD debug running first, and then) SWO trace running. Atollic blog has a nice intro how to do that quickly.

Related

How to implement hardware interrupts in uCOS II and TM4C123G (ARM M4) MCU?

Background:
I am using uCOS II, Keil uVision 5, and a TIVA board with the TM4C123GH6PM MCU on it. I was given a the port for uCOS II as well as a blank project file to get started. I wrote the tasks needed and the program works correctly but now I am interested in implementing interrupts and trying to understand how they can coexist with an RTOS. This is all done in C.
Issue:
Interrupts do not work; they simply don't fire up. There are instances where the other tasks won't execute either. The core issue is that I don't really understand how interrupts can coexist with the RTOS. I've written code (in both assembly and C) on baremetal where interrupts work perfectly and I fully understand how they work when there is no layer in-between the code and the cpu.
What I've Tried:
I read the book and reference manual that came with uCOS-II and searched for ways to implement interrupts. No mentioning whatsoever; the only thing mentioned about interrupts is how they interact with the scheduler so interrupts are only covered in the theoretical domain.
I asked on the micrium (original vendor) forum and no reply/seems like a dead forum
I looked at the libraries included with the uCOS port and found something useful:
bsp_int is the library that deals with the interrupts. BSP stands for Board Support Package and is intended to facilitate the interaction between the software and the code
The library has functions to register an interrupt and enable it. The rtos uses its own table of ISR handlers mapped to the NVIC of the cpu. All handlers are filtered through a generic handler. The two useful functions from this library are:
bsp_intVectSet which takes the interrupt trigger ID (i.e bsp_int_id_gpiof) and a pointer to the interrupt handler and registers it
bsp_intEn which takes the interrupt ID and enables it
The bsp_int library is included in bsp.c which calls the initialization function (from bsp_int) for interrupts (bsp_IntInit())
The bsp.h file is included in the main application file (app.c)
app.c main is the entry point of the program. The main disables interrupt, initializes uCOS (i.e the kernel) creates the first/starting task called AppTaskStart, and starts multitasking (i.e gives control to the rtos and the function never returns). I'm assuming the kernel reenables interrupts since it needs those to run
So the way the rtos works (to my understanding) is that it hijacks the systick timer so at every clock tick, the kernel is called and is able to schedule the tasks.
AppTaskStart, which is the very first task to execute within the kernel domain, calls bsp_init (in which, bsp_IntInit is called to initialize the interrupt table and more) and performs other initialization tasks
The way I've set up interrupts without a kernel before, was using the Tivaware library (in C) provided by TI. It has functions for creating interrupts, specifying the trigger (i.e rising/falling edge, timer overflow, etc.), and enabling them. This method works and I thought is what I should be using to set up the interrupt I want
So I used the tivaware library to set up interrupts on one of the gpio ports (to which, mechanical switches are connected) on the rising edge. The code for this, as well as other code to start the port f peripheral, set the switches pins to input, and enable pull-ups, is included in bsp_init (bsp.c) which is called from AppTaskStart which is called from main. So far everything works perfectly, the rtos initiliazes, and all its tasks execute accordingly. When I try to move the code directly to the main and flash the program onto the board, the rtos initializes (leds blink) but then the tasks don't execute. Any ideas why that might be?
If I add the code to also enable and register the interrupt for when the switch is closed in the same function, using code from the tivaware library, the rtos does not initialize.
Do I need to setup/register/enable interrupts using the tivaware library as well as register and enable them using the board support package (bsp) library? The way I understand this so far is that the bsp is registering/enabling interrupts for the kernel only whereas the tivaware code is enabling them by directly writing to the registers so the latter is needed to setup the cpu portion of the interrupts and the former is needed to setup the OS portion of the interrupts. But I don't know. I really don't understand how they've designed incorporating interrupts under uCOS II. They do specify how the interrupt handler should be written and what macros to use but nothing else.
What should I try next? Does anybody have any experience with working with these two components (the rtos and the board)?
I am just stuck at this point and I've been playing with the code, moving stuff around, trying to find a clue/lead to solving this issue. I can't even debug the rtos because uVision does not support uCOS and I can't use step-debugging because interrupts are firing at every clock-tick and the PC is being changed constantly so the IDE can't follow it.
I know IAR Embedded Workbench has support for uCOS-II and I have the app on my laptop and I tried setting up a project but I was only given a port/starter project for Keil and I don't know how to set one up for IAR EW. The only ports on Micrium's website are for the TM4C129 series and I tried using that to start an IAR EW project but I couldn't get it to work (libraries not being linked/missing files).
Thank You!
Does anybody have any experience with working with these two components (the rtos and the board)?
I'm afraid I haven't worked with uCos (but with other OSes, mainly SysBios and FreeRTOS) and I haven't worked with Tiva (but with Sitara AM335x) yet. Still, I think some hints below may be helpful for you (and apply in spite of the different implementations you are using).
What should I try next?
These are the steps I recommend you to consider. You can put them into any order you find most helpful.
interrupt priorities of ISRs that call RTOS library APIs must not be higher than the level that is taken into account by RTOS, otherwise the RTOS-internal states may get corrupted, and anything can happen. Please check your OS documentation.
Please verify the position of the interrupt vector table and its contents:
Does every vector table entry point to one of the ISR wrapper handlers provided by the RTOS, or do you also find "independent" ISR implementations? If so, what do the latter do?
If you find pointers into third-party libraries you don't have the code for, don't give up. These can be as important...
Even more important than including the right header to bsp_Int... APIs is that interrupt management of all software components runs through one unique API, e.g., the bsp_Int... one.
Your assumptions about app.c/main() sound reasonable. Please make sure that you also know about every component that accesses interrupts indirectly.
AppTaskStart, which is the very first task to execute within the kernel domain, calls bsp_init (in which, bsp_IntInit is called to initialize the interrupt table and more) and performs other initialization tasks
Please check what happens if you place a breakpoint at the top of every task function. Then you should be able to watch all tasks start and run into its breakpoint once.
The way I've set up interrupts without a kernel before, was using the Tivaware library (in C) provided by TI. It has functions for creating interrupts, specifying the trigger (i.e rising/falling edge, timer overflow, etc.), and enabling them. This method works and I thought is what I should be using to set up the interrupt I want
You should make sure that the Tivaware library only uses interrupts in a way that is compatible to your RTOS. You can do this by RTM or reading the sources.
So I used the tivaware library to set up interrupts on one of the gpio ports (to which, mechanical switches are connected) on the rising edge. The code for this, as well as other code to start the port f peripheral, set the switches pins to input, and enable pull-ups, is included in bsp_init (bsp.c) which is called from AppTaskStart which is called from main. So far everything works perfectly, the rtos initiliazes, and all its tasks execute accordingly. When I try to move the code directly to the main and flash the program onto the board, the rtos initializes (leds blink) but then the tasks don't execute. Any ideas why that might be?
Could it be that an electronic problem at one of the controller pins connected to the interrupt starts triggering that interrupt all the time?
If I add the code to [...]
Have you tried creating a
minimal reproducible example?
When you do this, you can enhance the effect by simultaneously performing
rubber duck debugging.
Do I need to setup/register/enable interrupts using the tivaware library as well as register and enable them using the board support package (bsp) library? The way I understand this so far is that the bsp is registering/enabling interrupts for the kernel only whereas the tivaware code is enabling them by directly writing to the registers so the latter is needed to setup the cpu portion of the interrupts and the former is needed to setup the OS portion of the interrupts. But I don't know. I really don't understand how they've designed incorporating interrupts under uCOS II. They do specify how the interrupt handler should be written and what macros to use but nothing else.
This sounds dangerous. I haven't worked with Tiva yet, but instead with another TI chip (AM335x). There we had a similar situation, different libraries accessing different/overlapping parts of the same system resource by means of different abstraction layers. Situation only started improving when we tidied up the mess of abstraction layers ignoring each other and ported some code to a common abstraction layering scheme.
And some PS:
You can write your ISRs in C or assembler, as you like. Depending on the quality of your toolchain and optimisation settings, assember may yield better performance (or not at all), and by calling C APIs from assembler, some programmers tend to make new mistakes. I'd recommend to stay within C until you know in detail what is happening around your OS and IRQs.

Reading live RAM variables from a Micro controller in VB.net

I want to read the global variables via the JTAG port, live, when a program is running on the microcontroller. Is it possible?
JTAG defines only a physical interface, it does not describe the on-chip debug capabilities of a particular processor which may or may not support access during execution.
Moreover whether it can be done in VB is not really the issue, the important issue is what hardware device and/or I/O port you are using for the JTAG interface, and whether a driver and API to access via .Net is available. That said VB.Net is not the first language I'd choose for that in any case.
A good place to start perhaps is OpenOCD, though it is not .Net specific.
"Almost-Live" is possibly doable, depending on the JTAG implementation. Often JTAG activity which reads memory does so by stealing cycles from the micro (or sometimes even inserting instructions into the pipeline). I'm not sure there's a micro which allows completely transparent access to memory over JTAG.
"All you need to do" is understand the JTAG implementation, know where the variable is located and issue a "memory read" command by wiggling the JTAG pins in the appropriate fashion. This is not a small task, which is why professional engineers are willing to pay (sometimes large amounts of) money for tools which perform this task.
Often the free (limited) toolchains the vendors provide can perform this also.
Yes, I suppose it is possible. But you'll need to drive the JTAG port (that sounds painful!) and know exactly where the data is stored on the chip, and what the formatting is.

What is the best way to implement my program for Keil MCB1700 evaluation board?

I want to develop a program for MCB1700 evaluation board.
Client software of PC reads a picture from HDD.
Then it sends the picture to MCB1700 evaluation board through socket (Ethernet).
Server of MCB1700 receives pictures from PC through Socket-connection and displays it on LCD.
Also server must perform such tasks:
To save picture to USB-stick;
To read picture from USB-stick and send it to client through socket;
To send and receive information through CAN
COM-logging.
etc.
Socket connection can be implemented with help of CMSIS and RL-ARM libraries.
But, as far as I understood, in both cases sofrware has to listen for incoming TCP-connection and handle network's events in an endless loop - All examples of Keil are based on such principle.
I always thought, that it is a poor way for embedded programming to use endless loops.
Moreover, I read such interesting statement
"it is certainly possible to create real-time programs without an RTOS
(by executing one or more tasks in a loop)"
http://www.keil.com/support/man/docs/rlarm/rlarm_ar_artxarm.htm
So, as I understood, it is normal practice to execute a lot of tasks in loop?
while (1) {
task1();
task2();
...
taskN();
}
I think that it is better to handle all events by interrupts.
Is it possible to use socket conection of CMSIS and RL-ARM libraries and organize all my software by handling of interrupts?
My server (on MCB1700) has to perform a lot of tasks. I guess, I should use RTOS RTX in my software. Isn't it so? Is it better to implement my software without RTX?
Simple real-time systems often operate in a "big-loop" architecture, with some time critical events handled by interrupts. It is not a "bad" architecture, bit it is somewhat inflexible, scales poorly, and any change may affect the timing of and or all parts of the system.
It is not true that RL-TCPnet must work this way, but it is designed to run stand alone, and the examples work that way so that there are no dependencies on other libraries for the widest applicability. They are only examples and are intended to demonstrate RL-TCPnet and nothing else. In that sense the examples are not realistic and should be adapted to your requirements.
You might devise a system where all your application code runs in interrupt handlers and the network stack is services in an endless-loop in the thread context, but architecturally that is probably far worse than the "big-loop" architecture, since you may end up with very long interrupt handlers, with the higher priority ones starving and affecting the real-time response of lower priority ones. You end up with a system that is difficult to schedule satisfactorily. Moreover not all library routines are suitable for execution in an interrupt handler, so it would be rather restrictive.
It is possible to service the network stack in a low priority thread in an RTOS. The framework for such operation is described in the documentation in the section: Using RL-TCPnet with RTX kernel.. This will require you to understand and use the RL-RTX kernel library, but given your reservations about "big loop" code, and the description of tasks your system must perform, it sounds like that is what you want to do in any case. If you choose to use a different RTOS, then RL-TCPnet can work in a similar way with any RTOS.

On reset what happens in embedded system?

I have a doubt regarding the reset due to power up:
As I know that microcontroller is hardwired to start with some particular memory location say 0000H on power up. At 0000h, whether interrupt service routine is written for reset(initialization of stack pointer and program counter etc) or the reset address is there at 0000h(say 7000) so that micro controller jumps at 7000 address and there initialization of stack and PC is written.
Who writes this reset service routine? Is it the manufacturer of microcontroller chip(Intel or microchip etc) or any programmer can change this reset service routine(For example, programmer changed the PC to 4000h from 7000h on power up reset resulting into the first instruction to be fetched from 4000 instead of 7000).
How the stack pointer and program counter are initialized to the respective initial addresses as on power up microcontroller is not in the state to put the address into stack pointer and program counter registers(there is no initialization done till reset service routine).
What should be the steps in the reset service routine considering all possibilities?
With reference to your numbering:
The hardware reset process is processor dependent and will be fully described in the data sheet or reference manual for the part, but your description is generally the case - different architectures may have subtle variations.
While some microcontrollers include a ROM based boot-loader that may contain start-up code, typically such bootloaders are only used to load code over a communications port, either to program flash memory directly or to load and execute a secondary bootloader to RAM that then programs flash memory. As far as C runtime start-up goes, this is either provided with the compiler/toolchain, or you write it yourself in assembler. Normally even when start-up code is provided by the compiler vendor, it is supplied as source to be assembled and linked with your application. The compiler vendor cannot always know things like memory map, SDRAM mapping and timing, or processor clock speed or what oscillator crystal is used in your hardware, so the start-up code will generally need customisation or extension through initialisation stubs that you must implement for your hardware.
On ARM Cortex-M devices in fact the initial PC and stack-pointer are in fact loaded by hardware, they are stored at the reset address and loaded on power-up. However in the general case you are right, the reset address either contains the start-up code or a vector to the start-up code, on pre-Cortex ARM architectures, the reset address actually contains a jump instruction rather than a true vector address. Either way, the start-up code for a C/C++ runtime must at least initialise the stack pointer, initialise static data, perform any necessary C library initialisation and jump to main(). In the case of C++ it must also execute the constructors of any global static objects before calling main().
The processor cores normally have as you say a starting address of some sort of table either a list of addresses or like ARM a place where instructions are executed. Wrapped around that core but within the chip can vary. Cores that are not specific to the chip vendor like 8051, mips, arm, xscale, etc are going to have a much wider range of different answers. Some microcontroller vendors for example will look at strap pins and if the strap is wired a certain way when reset is released then it executes from a special boot flash inside the chip, a bootloader that you can for example use to program the user boot flash with. If the strap is not tied that certain way then sometimes it boots your user code. One vendor I know of still has it boot their bootloader flash, if the vector table has a valid checksum then they jump to the reset vector in your vector table otherwise they sit in their bootloader mode waiting for you to talk to them.
When you get into the bigger processors, non-microcontrollers, where software lives outside the processor either on a boot flash (separate chip from the processor) or some ram that is managed somehow before reset, etc. Those usually follow the rule for the core, start at address 0xFFFFFFF0 or start at address 0x00000000, if there is garbage there, oh well fire off the undefined instruction vector, if that is garbage just hang there or sit in an infinite loop calling the undefined instruction vector. this works well for an ARM for example you can build a board with a boot flash that is erased from the factory (all 0xFFs) then you can use jtag to stop the arm and program the flash the first time and you dont have to unsolder or socket or pre-program anything. So long as your bootloader doesnt hang the arm you can have an unbrickable design. (actually you can often hold the arm in reset and still get at it with the jtag debugger and not worry about bad code messing with jtag pins or hanging the arm core).
The short answer: How many different processor chip vendors have there been? There are many different solutions, as many as you can think of and more have been deployed. Placing a reset handler address in a known place in memory is the most common though.
EDIT:
Questions 2 and 3. if you are buying a chip, some of the microcontrollers have this protected bootloader, but even with that normally you write the boot code that will be used by the product. And part of that boot code is to initialize the stack pointers and prepare memory and bring up parts of the chip and all those good things. Sometimes chip vendors will provide examples. if you are buying a board level product, then often you will find a board support package (BSP) which has working example code to bring up the board and perhaps do a few things. Say the beagleboard for example or the open-rd or embeddedarm.com come with a bootloader (u-boot or other) and some already have linux pre-installed. boards like that the user usually just writes some linux apps/drivers and adds them to the bsp, but you are not limited to that, you are often welcome to completely re-write and replace the bootloader. And whoever writes the bootloader has to setup the stacks and bring up the hardware, etc.
systems like the gameboy advance or nds or the like, the vendor has some startup code that calls your startup code. so they may have the stack and such setup for them but they are handing off to you, so much of the system may be up, you just get to decide how to slice up the memorires, where you want your stack, data, program, etc.
some vendors want to keep this stuff controlled or a secret, others do not. in some cases you may end up with a board or chip with no example code, just some data sheets and reference manuals.
if you want to get into this business though you need to be prepared to write this startup code (in assembler) that may call some C code to bring up the rest of the system, then that might start up the main operating system or application or whatever. Microcotrollers sounds like what you are playing with, the answers to your questions are in the chip vendors users guides, some vendors are better than others. search for the word reset or boot in the document to try to figure out what their boot schemes are. I recommend you use "dollar votes" to choose the better vendors. A vendor with bad docs, secret docs, bad support, dont give them your money, spend your money on vendors with freely downloadable, well written docs, with well written examples and or user forums with full time employees trolling around answering questions. There are times where the docs are not available except to serious, paying customers, it depends on the market. most general purpose embedded systems though are openly documented. the quality varies widely, but the docs, etc are there.
Depends completely on the controller/embedded system you use. The ones I've used in game development have the IP point at a starting address in RAM. The boot strap code supplied from the compiler initializes static/const memory, sets the stack pointer, and then jumps execution to a main() routine of some sort. Older systems also started at a fixed address, but you manually had to set the stack, starting vector table, and other stuff in assembler. A common name for the starting assembler file is CRT0.s for the stuff I've done.
So 1. You are correct. The microprocessor has to start at some fixed address.
2. The ISR can be supplied by the manufacturer or compiler creator, or you can write one yourself, depending on the complexity of the system in question.
3. The stack and initial programmer counter are usually handled via some sort of bootstrap routine that quite often can be overriden with your own code. See above.
Last: The steps will depend on the chip. If there is a power interruption of any sort, RAM may be scrambled and all ISR vector tables and startup code should be rewritten, and the app should be run as if it just powered up. But, read your documentation! I'm sure there is platform specific stuff there that will answer these for your specific case.

USB for embedded devices - designing a device driver/protocol stack

I have been tasked to write a device driver for an embedded device which will communicate with the micro controller via the SPI interface. Eventually, the USB interface will be used to download updated code externally and used during the verification phase.
My question is, does anyone know of a good reference design or documentation or online tutorial which covers the implementation/design of the USB protocol stack/device driver within an embedded system? I am just starting out and reading through the 650 page USB v2.0 spec is a little daunting at the moment.
Just as a FYI, the micro controller that I am using is a Freescale 9S12.
Mark
Based upon goldenmean's (-AD) comments I wanted to add the following info:
1) The embedded device uses a custom executive and makes no use of a COTS or RTOS.
2) The device will use interrupts to indicate data is ready to be retrieved from the device.
3) I have read through some of the docs regarding Linux, but since I am not at all familiar with Linux it isn't very helpful at the moment (though I am hoping it will be very quickly).
4) The design approach, for now at least, it to write a device driver for the USB device then a USB protocol layer (I/O) would reside on top of the device driver to interpret the data. I would assume this would be the best approach, though I could be wrong.
Edit - A year later
I just wanted to share a few items before they vanish from my mind in case I never work on a USB device again. I ran into a few obstacles when developing code and getting it up and running for the first.
The first problem I ran into was that when the USB device was connected to the Host (Windows in my case) was the host issues a Reset request. The USB device would reset and clear the interrupt enable flags. I didn't read the literature enough to know this was happening, thus I was never receiving the Set-Up Request Interrupt. It took me quite a while to figure this out.
The second problem I ran into was not handling the Set-Up Request for Set_Configuration properly. I was handling it, but I was not processing the request correctly in that the USB device was not sending an ACK when this Set-Up Request came in. I eventually found this out by using a hardware USB protocol analyzer.
There were other issues that I ran into, but these were the two biggest ones that took me quite a while to figure out. The other issue I had to worry about is big-endian and little-endian, Freescale 9S12 vs USB data format (Intel), respectively.
I ended up building the USB device driver similar to UART device drivers I had done in the past. I have posted the code to this at the following URL.
http://lordhog.wordpress.com/2010/12/13/usb-drive
I tend to use structures a lot, so people may not like them since they are not as portal as using #defines (e.g., MAX3420_SETUP_DATA_AVAIL_INT_REQR 0x20), but I like them since it makes the code more readable for me. If anyone has questions regarding it please feel free to e-mail and I can try to give some insight to it. The book "USB Complete: The Developer's Guide" was helpful, so long as you knew what areas to concentrate on. This was a simple application and only used low-speed USB.
While writing a device driver for any interface (USB, Parallel port, etc...) the code needed to be developed would depend upon whether there is any Operating System(OS), RTOS running on that Processor/Micro controller.
e.g. if thats going to run say WinCE - It will have its own Driver development Kit , and steps to be followed in the device driver development. Same for any other OS like Linux, symbian.
If its going to be a plain firmware code(No OS) which is going to control the processor/microcontroller, then it's a different situation altogether.
So based on either of the above situation u are in, one needs to read & understand:-
1.) The Hardware Specification of the processor/micro controller development board - Register files, ports, memory layout, etc.
2.) USB spec
3.) Couple of pointers i found quickly. Google shud be ur friend!
http://www.lrr.in.tum.de/Par/arch/usb/usbdoc/ - Linux USB device driver
http://www.microsoft.com/technet/archive/wce/support/usbce.mspx
-AD
I've used an earlier edition of USB Complete by Jan Axelson. Indeed very complete.
From the editorial review:
Now in its fourth edition, this developer's guide to the Universal Serial Bus (USB) interface covers all aspects of project development, such as hardware design, device firmware, and host application software.
I'm curious, why did you pick the 9S12? I used it at a previous job, and was not pleased.
It had lousy gcc support so we used Metrowerks
which may have been okay for C, but often generated buggy C++
had a lousy IDE with binary project files!
The 9s12 was also slow, a lot of instructions executed in 5 cycles.
Not very power efficient, either.
no barrel shifter, made operations that are common in embedded code slow
not that cheap.
About the only thing I dislike more is an 8051. I'm using an ARM CortexM3 at my current job, it's better than a 9S12 in every way (faster clock, more work done per clock, less power consumption, cheaper, good gcc support, 32-bit vs. 16-bit).
I don't know which hardware you're planning to use but assuming that's flexible, STMicro offers a line of microcontrollers with USB/SPI support and a library of C-code that can be used with their parts. -- I've used their ARM7 series micros for years with great success.
Here is an excellent site maintained by Jonathan Valvano, a professor at the University of Texas. He teaches four courses over there (three undergraduate, one graduate), all are about using a 9S12 microcontroller. His site contains all the lecture notes, lab manuals, and more importantly, starter files, that he uses for all his classes.
The website looks like it's from the 90's, but just dig around a bit and you should find everything you need.
users.ece.utexas.edu/~valvano/
Consider AVR for your next MCU project because of it's wonderful LUFA and V-USB libraries.
I'm working on a project using the Atmel V71. The processor is very powerful and among lot's of high end connectivity offered on chip is a USB engine that will do device or host modes for 480 Mhz or 48Mhz (not USB 3.0). The tools are free and come with a number of host and device USB example projects with all the USB stack code right there. It supports 10 end points and all the transfers are done via DMA so you have most of the processor horsepower available for other tasks. The Atmel USB stack works without needing an RTOS