USB-HID Gamepad with MSP430 USB API - Issue with Descriptors - usb

I am trying to emulate a USB-HID gamepad/controller using the MSP430 with 7-14 analog inputs but I had trouble getting the descriptors right for my custom USB-HID device.
I came across this code online: https://github.com/TI-FIRST/MSP430-Gamepad which worked great to get the MSP430 up and running as a gamepad with only 8 analog inputs.
The main.c file contains instructions to change the report structure:
This example functions as a gamepad on the host. The gamepad has a HID report as described in
report_desc_HID0 variable in descriptors.c. Please note that if this report structure is
changed then the following lengths need to be updated -
#define report_desc_size_HID0 in descriptors.h needs to be updated with descriptor size
report_desc_size and report_len_input need t be updated in descriptors.c
As is this demo will enumerate with 18 bytes of input report and 2 bytes of output report
The input and output report structures for the gamepad as described in USB_gamepad.h
The input reports are used to report ADC values and status of buttons (GPIO)
The output report is used to set/reset indicators (GPIO)
The descriptors currently in the descriptors.c file are:
UsagePage(USB_HID_GENERIC_DESKTOP),
Usage(USB_HID_JOYSTICK),
Collection(USB_HID_APPLICATION),
//
// The axis for the controller.
//
UsagePage(USB_HID_GENERIC_DESKTOP),
Usage (USB_HID_POINTER),
Collection (USB_HID_PHYSICAL),
//
// The X, Y and Z values which are specified as 8-bit absolute
// position values.
//
Usage (USB_HID_X),
Usage (USB_HID_Y),
Usage (USB_HID_Z),
Usage (USB_HID_RX),
Usage (USB_HID_RY),
Usage (USB_HID_RZ),
Usage (USB_HID_SLIDER),
Usage (USB_HID_DIAL),
//
// 8 16-bit absolute values.
//
ReportSize(16),
ReportCount(8),
Input(USB_HID_INPUT_DATA | USB_HID_INPUT_VARIABLE |
USB_HID_INPUT_ABS),
//
// Max 32 buttons.
//
UsagePage(USB_HID_BUTTONS),
UsageMinimum(1),
UsageMaximum(NUM_BUTTONS),
LogicalMinimum(0),
LogicalMaximum(1),
PhysicalMinimum(0),
PhysicalMaximum(1),
//
// 8 - 1 bit values for the buttons.
//
ReportSize(1),
ReportCount(32),
Input(USB_HID_INPUT_DATA | USB_HID_INPUT_VARIABLE |
USB_HID_INPUT_ABS),
//
// Max 16 indicator bits
//
UsagePage(USB_HID_BUTTONS),
UsageMinimum(1),
UsageMaximum(NUM_INDICATORS),
LogicalMinimum(0),
LogicalMaximum(1),
PhysicalMinimum(0),
PhysicalMaximum(1),
//
// 8 - 1 bit values for the leds.
//
ReportSize(1),
ReportCount(16),
Output(USB_HID_INPUT_DATA | USB_HID_INPUT_VARIABLE |
USB_HID_INPUT_ABS),
EndCollection,
EndCollection
I would like to change it to 14 16-bit analog inputs like this:
UsagePage(USB_HID_GENERIC_DESKTOP),
Usage(USB_HID_JOYSTICK),
Collection(USB_HID_APPLICATION),
//
// The axis for the controller.
//
UsagePage(USB_HID_GENERIC_DESKTOP),
Usage (USB_HID_POINTER),
Collection (USB_HID_PHYSICAL),
//
// The X, Y and Z values which are specified as 8-bit absolute
// position values.
//
Usage (USB_HID_X),
Usage (USB_HID_Y),
Usage (USB_HID_Z),
Usage (USB_HID_RX),
Usage (USB_HID_RY),
Usage (USB_HID_RZ),
Usage (USB_HID_SLIDER),
Usage (USB_HID_DIAL),
Usage (USB_HID_VX),
Usage (USB_HID_VY),
Usage (USB_HID_VZ),
Usage (USB_HID_VRX),
Usage (USB_HID_VRY),
Usage (USB_HID_VRZ),
//
// 8 16-bit absolute values.
//
ReportSize(16),
ReportCount(14),
Input(USB_HID_INPUT_DATA | USB_HID_INPUT_VARIABLE |
USB_HID_INPUT_ABS),
//
// Max 32 buttons.
//
UsagePage(USB_HID_BUTTONS),
UsageMinimum(1),
UsageMaximum(6),
LogicalMinimum(0),
LogicalMaximum(1),
PhysicalMinimum(0),
PhysicalMaximum(1),
//
// 8 - 1 bit values for the buttons.
//
ReportSize(1),
ReportCount(32),
Input(USB_HID_INPUT_DATA | USB_HID_INPUT_VARIABLE |
USB_HID_INPUT_ABS),
//
// Max 16 indicator bits
//
UsagePage(USB_HID_BUTTONS),
UsageMinimum(1),
UsageMaximum(6),
LogicalMinimum(0),
LogicalMaximum(1),
PhysicalMinimum(0),
PhysicalMaximum(1),
//
// 8 - 1 bit values for the leds.
//
ReportSize(1),
ReportCount(16),
Output(USB_HID_INPUT_DATA | USB_HID_INPUT_VARIABLE |
USB_HID_INPUT_ABS),
EndCollection,
EndCollection
However, I cannot figure out how to calculate the length/size/bytes of the descriptor. I tried going through the USB-HID spec (Device Class Definition for HID 1.11 | USB-IF) which states that items have a byte prefix, but I can't really figure out which items to count and how they add up. Apologies but I am very inexperienced in USB.
Would someone be able look at the code, and let me know what values I need in report_desc_size, report_len_input in the descriptor files plus anything else I need to change to expand the functionality of this code for 14 16-bit analog inputs.
P.S. To replicate and see the working gamepad, just upload the code to a dev kit and search 'Set up USB Game Controllers' on Windows, which should recognize it as a gamepad if everything is running correctly and the reports are being accepted.

The size of the report descriptor is sizeof(report_desc_HID0). For convenience I'd recommend restructuring the code so you compute the descriptor length at compile time instead of hard-coding it.
You're using macros to construct the descriptor, and each macro adds a fixed number of bytes. You've added six new Usage items, each of which expands to two bytes:
#define Usage(ui8Value) 0x09, ((ui8Value) & 0xff)
If the original descriptor is 80 bytes, I'd guess the new descriptor size is 92 bytes.

Related

Adafruit Trinket M0 (SAMD21) analog read rate slow

Why is the analog read rate seemingly slow (46 ksamples/s) when it should be fast (250 ksamples/s) for my Adafruit Trinket M0? See this simple Arduino code for details; why is PointCount only 46?
//TrinketReadRateTest
//27Nov2022
//Running on Adafruit Trinket M0, SAMD21
//Measures read times of analog reads on Trinket M0
//nothing at all connected to the Trinket
//according to the settings in this wiring.c file lines 160-173, samples per second should be = 250,000:
//C:\Users\<MyUserName>\AppData\Local\Arduino15\packages\adafruit\hardware\samd\1.7.11\cores\arduino\wiring.c
//in this loop, every PointCount is 2 samples, so in 2 millisecs, number of PointCounts should be:
//(.002 secs)(250000 samples/sec)(PointCounts/ 2 samples) = 250
//however, this routine gives a value of 46 WHY?
//if line 170 prescaler is set to DIV16 instead of DIV32, PointCounts gets to 66 (accuracy ???) so this wiring.c is being loaded
#define INPUT1 A3 //ATSAMD21G PA04
#define INPUT2 A4 //ATSAMD21G PA05
unsigned int Input1[1000];
unsigned int Input2[1000];
unsigned int PointCount = 0;
void setup() {
pinMode(INPUT1, INPUT);
pinMode(INPUT2, INPUT);
}
void loop() {
PointCount = 0;
unsigned long StartTime = micros();
do {
Input1[PointCount] = analogRead(INPUT1);
Input2[PointCount] = analogRead(INPUT2);
PointCount++;
} while (micros() - StartTime < 2000); //read 2 millisecs of data points as fast as they come
Serial.begin(9600); //keep serial off during data reads to avoid the question...
delay(1000);
Serial.println(PointCount);
Serial.end();
delay(1000);
}
I tried reading analog samples as fast as they would come. I expected to receive samples at a rate of 250000 per second. What actually resulted was a rate of 46000 samples per second.
Added 28Nov: the wiring.c file is not easy to find. If you want it:
download the tar.bz2 file:
https://adafruit.github.io/arduino-board-index/boards/adafruit-samd-1.7.11.tar.bz2
extract the tar file using 7-zip or whatever
goto cores\arduino\wiring.c
Here are the relevant lines of wiring.c:
//set to 1/(1/(48000000/32) * 6) = 250000 SPS
while(GCLK->STATUS.reg & GCLK_STATUS_SYNCBUSY);
GCLK->CLKCTRL.reg = GCLK_CLKCTRL_ID( GCM_ADC ) | // Generic Clock ADC
GCLK_CLKCTRL_GEN_GCLK0 | // Generic Clock Generator 0 is source
GCLK_CLKCTRL_CLKEN ;
while( ADC->STATUS.bit.SYNCBUSY == 1 ); // Wait for synchronization of registers between the clock domains
ADC->CTRLB.reg = ADC_CTRLB_PRESCALER_DIV32 | // Divide Clock by 32.
ADC_CTRLB_RESSEL_10BIT; // 10 bits resolution as default
ADC->SAMPCTRL.reg = 5; // Sampling Time Length
Adding this additional question 8Dec2022:
wiring.analog.c (in same folder as wiring.c) executes the analog routines. Line 369 of wiring.analog.c says the same thing that the SAMD21 data sheet says: "The first conversion after the reference is changed must not be used."
In lines 371-394, the analogRead routine for SAMD21, two reads are always made; the first to account for the statement above. But why do two reads for every analogRead? The analog reference is not changed with every read and is set prior to any reads. So why not just do one conversion after the reference is set? That way, there only needs to be one conversion per analogRead.
I moved the first conversion routine to the very end of analogReference. It speeds things up to PointCount = 79. Is this a problem? It does not seem to reduce accuracy.
Your second question is easier to answer than your first. The reason there are two ADC reads in the Arduino code is because there is a bug in the ADC hardware on the SAMD21. In the past, Arduino provided a calibration method that allowed you to correct for this instead of adding in the second read and throwing out the first garbage data. This was problematic for a number of reasons and eventually library was modified. There's an old hackaday article that provides a little more detail.
As for the ADC reads being slow, the limitation you're running into is a limitation of the SAMD library for Arduino. For reference, I am using the SAMD21 datasheet and the code from Arduino SAMD on GitHub. To start out with, the Clock speed should be 48Mhz. Using the DIV32 predivider, the ADC clock frequency is 1.5Mhz. Each ADC conversion from the SAMD21 library takes 63 clock cycles. Leaving you with ~23.8Khz. 23.8Khz * 2ms = 47.619 Conversions. Add on top of that the overhead caused by switching between the two input pins (I don't know the exact characterization but likely 1-2 clock pulses) and you'd end up with closer to 46 Conversions in 2ms.
63 clock pulses per conversion is comically high. Typically, the first read is closer to 20 pulses and subsequent ones are 13.5. There is another post on the electrical engineering Stack Exchange where someone tackles this and posts a link to their own library for improving the conversion speeds.

STM32 USB Custom HID only 1 byte per transaction

I know that maximum speed of USB HID device is 64 kbps, but on oscilloscope I get transactions every 1 ms, which contain only ONE byte. My HID report descriptor listed below. What i must change to achieve 64Kbps? Currently my bInterval = 0x01 (1 ms polling for interrupt endpoint), but actual speed is 65 bytes/s, because it add reportID byte to my 64-byte data. I think, USB should not divide single 64+1 packet to 65 singlebyte packets. For experiment I use reportID=1 (from STM32 to PC). From PC side I use hidapi.dll to interact.
__ALIGN_BEGIN static uint8_t CUSTOM_HID_ReportDesc_FS[USBD_CUSTOM_HID_REPORT_DESC_SIZE] __ALIGN_END =
{
/* USER CODE BEGIN 0 */
USAGE_PAGE(USAGE_PAGE_UNDEFINED)
USAGE(USAGE_UNDEFINED)
COLLECTION(APPLICATION)
REPORT_ID(1)
USAGE(1)
LOGICAL_MIN(0)
LOGICAL_MAX(255)
REPORT_SIZE(8)
REPORT_COUNT(64)
INPUT(DATA | VARIABLE | ABSOLUTE)
REPORT_ID(2)
USAGE(2)
LOGICAL_MIN(0)
LOGICAL_MAX(255)
REPORT_SIZE(8)
REPORT_COUNT(64)
OUTPUT(DATA | VARIABLE | ABSOLUTE)
REPORT_ID(3)
USAGE(3)
LOGICAL_MIN(0)
LOGICAL_MAX(255)
REPORT_SIZE(8)
REPORT_COUNT(64)
OUTPUT(DATA | VARIABLE | ABSOLUTE)
REPORT_ID(4)
USAGE(4)
LOGICAL_MIN(0)
LOGICAL_MAX(255)
REPORT_SIZE(8)
REPORT_COUNT(64)
OUTPUT(DATA | VARIABLE | ABSOLUTE)
/* USER CODE END 0 */
0xC0 /* END_COLLECTION */
};
HID uses interrupt IN/OUT to convey reports. In USB, Interrupt transfers are polled by host every 1 ms. Every time endpoint is polled, it may yield a 64-byte report (for Low/Full speed). That's probably where you get the 64kB/s figure from. Actually, limit is 1k report / second. Also note these limits are different for High-speed and Super-speed devices.
Report descriptor is one thing. What you actually send as interrupt-IN is something else. It should match, but this is not enforced by anything. You should probably look into the code that builds the interrupt IN transfer payload.
Side note: all you seem interested in is to send arbitrary chunks of data, then HID is probably not the relevant profile. Using bulk endpoints looks more appropriate (and you'll not be limited by interrupt endpoint polling rate).

Read variable length messages over SPI using Low Level (LL) api on STM32 MCU

My system is composed by an STM32NUCLEO board and a slave device connected over SPI. The slave device sends commands with a variable length: possible lengths are 4, 8, 10, 14 bits.
I'm trying to detect these messages on my nucleo board using the LL APIs and interrupts.
The solution I'm currently working on is based on setting the SPI with a data-width of 4 bits (SPI_InitStruct.DataWidth = LL_SPI_DATAWIDTH_4BIT) and then counting the number of words (1 word = 4 bits) that I receive. In this way, if I receive 1 word then it means that I have received a 4 bit command, 2 word --> 8 bit command. If I receive 3 words, it should mean that I have received a 10bit command (2 bits should be discarded), and so on.
Unfortunately, I have noticed that the LL APIs provides functions only for reading 8 bits or 16 bits at a time and currently I'm having issue in receiving a 4 bit command, since the function LL_SPI_ReceiveData8 expects to receive 8 bits.
Here is my implementation for the IRQ handler and for the callback:
IRQ Handler:
void SPI1_IRQHandler(void)
{
/* Check RXNE flag value in ISR register */
if(LL_SPI_IsActiveFlag_RXNE(SPI1))
{
/* Call function Slave Reception Callback */
SPI1_Rx_Callback();
}
/* Check STOP flag value in ISR register */
else if(LL_SPI_IsActiveFlag_OVR(SPI1))
{
/* Call Error function */
SPI1_TransferError_Callback();
}
}
Callback
void SPI1_Rx_Callback(void)
{
/* Read character in Data register.
RXNE flag is cleared by reading data in DR register */
aRxBuffer[ubReceiveIndex++] = LL_SPI_ReceiveData8(SPI1);
}
As said before in my opinion, the problem seems that I'm using the LL_SPI_ReceiveData8 function to read since I could not find something like LL_SPI_ReceiveData4.
Do you have some suggestions?
Furthermore, is it possible to set the SPI to use 2 bit datawidth instead of 4? Something like SPI_InitStruct.DataWidth = LL_SPI_DATAWIDTH_2BIT: in this way it should be easier to detect the commands since 4, 8, 10 and 14 are multiples of 2.
Thank you.
With the new information about the used controller:
It supports SPI data transfer length between 4 and 16 bit. So your fist try seems not so bad.
Your "problem" is that there is no 4 bit read function. This is caused by the receive data register that will always contain 16 bit but there are only 4 bit valid data in your case. the other bits are '0'.
Your callback function
aRxBuffer[ubReceiveIndex++] = LL_SPI_ReceiveData8(SPI1);
will write values from 0..15 to the aRxBuffer and you don't need a
ReceiveData4() to get your answer :-)
So also the Reference manual for the STM32L4 series Reference Manual at page 1193ff.
The minimal addresable chunk of data is byte. So even if you receive the 4 bits the read value is 8 bits.
BTW wht is this secret slave device which have varing word length?

Arduino + OV7670 - Without FIFO - Reading Snapshot

I know that there is a lot in internet (http://forum.arduino.cc/index.php?topic=159557.0 for example) about OV7670 and I read a lot about it, but seems something is missing.
First of all I took a look into the way how can we read pixel by pixel from the camera to build the rectangular 600 X 480 image, and this was quite easy to understand considering HREF, VSYNCH and PCLOCK described on documentation here: http://www.voti.nl/docs/OV7670.pdf. I understand XCLOCK as an input I need to give to OV7670 as a kind of cycle controller and RESET would be something to reset it.
So at this point I thought that the functionality of such camera would be covered by wiring the following pins:
D0..D7 - for data (pixel) connected to arduino digital pins 0 to 7 as INPUT on arduino board
XCLK - for camera clock connected to arduino digital pin 8 as OUTPUT from arduino board
PCLK - for pixel clock connected to arduino digital pin 9 as INPUT on arduino board
HREF - to define when a line starts / ends connected to arduino digital pin 10 as INPUT on arduino board
VSYCH - to define when a frame starts / ends connected to arduino digital pin 11 as INPUT on arduino board
GRD - groud connected to arduino GRD
3V3 - 3,3 INPUT connected to arduino 3,3v
RESET - connected to arduino RESET
PWDN - connected to arduino GRD
The implementation for such approach from my point of view would be something like:
Code:
for each loop function do
write high to XCLK
if VSYNCH is HIGH
return;
if HREF is LOW
return;
if lastPCLOCK was HIGH and currentPCLOCK is LOW
readPixelFromDataPins();
end for
My readPixelFromDataPins() basically read just the first byte (as I'm just testing if I can even read something from the camera), and it is written as follows:
Code:
byte readPixelFromDataPins() {
byte result = 0;
for (int i = 0; i < 8; i++) {
result = result << 1 | digitalRead(data_p[i]);
}
return result;
}
In order to check if something is being read from the camera I just print it to the Serial 9600, the byte read from data pins as a number. But currently I'm receiving only zero values. The code I'm using to retrieve an image is stored here: https://gist.github.com/franciscospaeth/8503747.
Did somebody that makes OV7670 work with Arduino already figure out what am I doing wrong? I suppose I'm using the XCLOCK wrongly right? What shall I do to get it working?
I searched a lot and I didn't found any SSCCE (http://sscce.org/) for this camera using arduino, if somebody have it please let me know.
This question is present on arduino forum (http://forum.arduino.cc/index.php?topic=211741.0) too.
your idea is not bad but ...
the xclock need to be a clock (in your program is just a transition from 0 to 1 and is freezing there)
you need also to use I2C with SIOC and SIOD for configuring the camera (or you can use the default settings, but I am not sure if is the correct output format for you, 30F/s,VGA, YUV format ....)
your code execution is slower using the serial output in the same loop with reading data
I will recommend you to toggle the xclock pin and to move the pixel print in a if(). Also you will be able to read Data only in a very precise time, if you want to read only one byte, than after a transition from 0 to 1 of HREF you need to wait for a new transition from 0 to 1 of PCLK (you will be able to see only one 0-1 transition of HREF after 784x2 transitions of PCLK, (640 active pixels + 144 dead time for each line) x 2 (for YUV or RGB are 2 bytes received for each pixel) )
Hello I am Mr_Arduino from the arduino forums. Your issue is that you are reading pixels too slow please do not use digital read to do such a thing. Also if you insist on using a separate function just to read a byte make sure the function is being inlined. You can do this by declaring your function as static inline. Also as mentioned above how are you generating the clock. You can generate the XCLK using PWM on the arduino.
I have created a working example here:
https://github.com/ComputerNerd/arduino-camera-tft/blob/master/captureimage.c
Edit: a 3rd party has copied part but not all of the code from the above link into the answer here. However, the link must remain as the code posted below requires additional files from that source to actually work.
Edit 2: Removed irrelevant code. You will need to modify what you do with the data.
void capImg(void){
cli();
uint8_t w,ww;
uint8_t h;
w=160;
h=240;
tft_setXY(0,0);
CS_LOW;
RS_HIGH;
RD_HIGH;
DDRA=0xFF;
//DDRC=0;
#ifdef MT9D111
while (PINE&32){}//wait for low
while (!(PINE&32)){}//wait for high
#else
while (!(PINE&32)){}//wait for high
while (PINE&32){}//wait for low
#endif
while (h--){
ww=w;
while (ww--){
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
}
}
CS_HIGH;
sei();
}
You can also find it on github.
You can use my instruction: how to retrieve image from ov7670 It contains all the steps you need. There is also instuction to setup FrameGrabber: how to run framegrabber

How I can fix this code to allow my AVR to talk over serial port?

I've been pulling my hair out lately trying to get an ATmega162 on my STK200 to talk to my computer over RS232. I checked and made sure that the STK200 contains a MAX202CPE chip.
I've configured the chip to use its internal 8MHz clock and divided it by 8.
I've tried to copy the code out of the data sheet (and made changes where the compiler complained), but to no avail.
My code is below, could someone please help me fix the problems that I'm having?
I've confirmed that my serial port works on other devices and is not faulty.
Thanks!
#include <avr/io.h>
#include <avr/iom162.h>
#define BAUDRATE 4800
void USART_Init(unsigned int baud)
{
UBRR0H = (unsigned char)(baud >> 8);
UBRR0L = (unsigned char)baud;
UCSR0B = (1 << RXEN0) | (1 << TXEN0);
UCSR0C = (1 << URSEL0) | (1 << USBS0) | (3 << UCSZ00);
}
void USART_Transmit(unsigned char data)
{
while(!(UCSR0A & (1 << UDRE0)));
UDR0 = data;
}
unsigned char USART_Receive()
{
while(!(UCSR0A & (1 << RXC0)));
return UDR0;
}
int main()
{
USART_Init(BAUDRATE);
unsigned char data;
// all are 1, all as output
DDRB = 0xFF;
while(1)
{
data = USART_Receive();
PORTB = data;
USART_Transmit(data);
}
}
I have commented on Greg's answer, but would like to add one more thing. For this sort of problem the gold standard method of debugging it is to first understand asynchronous serial communications, then to get an oscilloscope and see what's happening on the line. If characters are being exchanged and it's just a baudrate problem this will be particularly helpful as you can calculate the baudrate you are seeing and then adjust the divisor accordingly.
Here is a super quick primer, no doubt you can find something much more comprehensive on Wikipedia or elsewhere.
Let's assume 8 bits, no parity, 1 stop bit (the most common setup). Then if the character being transmitted is say 0x3f (= ascii '?'), then the line looks like this;
...--+ +---+---+---+---+---+---+ +---+--...
| S | 1 1 1 1 1 1 | 0 0 | E
+---+ +---+---+
The high (1) level is +5V at the chip and -12V after conversion to RS232 levels.
The low (0) level is 0V at the chip and +12V after conversion to RS232 levels.
S is the start bit.
Then we have 8 data bits, least significant first, so here 00111111 = 0x3f = '?'.
E is the stop (e for end) bit.
Time is advancing from left to right, just like an oscilloscope display, If the baudrate is 4800, then each bit spans (1/4800) seconds = 0.21 milliseconds (approx).
The receiver works by sampling the line and looking for a falling edge (a quiescent line is simply logical '1' all the time). The receiver knows the baudrate, and the number of start bits (1), so it measures one half bit time from the falling edge to find the middle of the start bit, then samples the line 8 bit times in succession after that to collect the data bits. The receiver then waits one more bit time (until half way through the stop bit) and starts looking for another start bit (i.e. falling edge). Meanwhile the character read is made available to the rest of the system. The transmitter guarantees that the next falling edge won't begin until the stop bit is complete. The transmitter can be programmed to always wait longer (with additional stop bits) but that is a legacy issue, extra stop bits were only required with very slow hardware and/or software setups.
I don't have reference material handy, but the baud rate register UBRR usually contains a divisor value, rather than the desired baud rate itself. A quick google search indicates that the correct divisor value for 4800 baud may be 239. So try:
divisor = 239;
UBRR0H = (unsigned char)(divisor >> 8);
UBRR0L = (unsigned char)divisor;
If this doesn't work, check with the reference docs for your particular chip for the correct divisor calculation formula.
For debugging UART communication, there are two useful things to do:
1) Do a loop-back at the connector and make sure you can read back what you write. If you send a character and get it back exactly, you know that the hardware is wired correctly, and that at least the basic set of UART register configuration is correct.
2) Repeatedly send the character 0x55 ("U") - the binary bit pattern 01010101 will allow you to quickly see the bit width on the oscilloscope, which will let you verify that the speed setting is correct.
After reading the data sheet a little more thoroughly, I was incorrectly setting the baudrate. The ATmega162 data sheet had a chart of clock frequencies plotted against baud rates and the corresponding error.
For a 4800 baud rate and a 1 MHz clock frequency, the error was 0.2%, which was acceptable for me. The trick was passing 12 to the USART_Init() function, instead of 4800.
Hope this helps someone else out!