Interrupt service routine to measure a phase difference [closed] - embedded

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm currently working with 9S12 from freescale, I really need some help in order to understand how to write correctly an ISR. In particular, I'm reporting the text of an exercise in which they asked me to measure the difference in phase between two square waveforms (at the input of the microcontroller).
The bus clock is 16 MHz and I have to use the timer module of the system, which provides a free-running counter (TCNT # 16 bit). The counter have to work # 500 kHz, that is achieved by setting a prescaler of 5, starting from the bus clock.
The two signals have the same frequency, which is given (25 Hz), but is required to measure it anyway.
I have to use the INTERRUPT procedure, using the correct registers (actually is not necessary using the exact same ones from the manual, I can use any names I want, instead I have to comment every line of the code) and variables.
My way of approaching the problem is very theoretical, but I need the C code.
In order to solve the problem I have to use the INPUT CAPTURE MODE, to measure the difference in term of units counted by the TCNT (TICKS) between the positive edge of signal 1 and the positive edge of signal 2. My doubts are in particular on the variables that I have to use, the type (LOCAL, GLOBAL, UNSIGNED, LONG (?)), how can I update the values correctly in the ISR and if I should take into account of the overflows of the counter and the respective interrupts generated by them.
I'm stuck in this problem, I hope someone can help me with some code examples in particular for the variables that I have to use and how to write the actual ISR.
Thank you to everyone!

Treat the following as pseudo-code; it is for you to wade through the datasheet to determine how to configure and access the timer-capture unit - I am not familiar with the specific part
In general, given the timer-counter is 16 bit:
volatile static uint16_t phase_count = 0 ;
volatile static uint16_t mean_period_count = 0 ;
int phasePercent()
{
(phase_count * 100) / mean_period_count ;
}
int frequencyHz()
{
500000 / mean_period_count ;
}
void TimerCounterISR( void )
{
static uint16_t count1 = 0 ;
static uint16_t count2 = 0 ;
static uint16_t period1 = 0 ;
static uint16_t period2 = 0 ;
uint16_t now = getCaptureCount() ;
if( isSignal1Edge() )
{
period1 = now - count1 ;
count1 = now ;
}
else if( isSignal2Edge() )
{
period2 = now - count2 ;
count2 = now ;
phase_count = period2 - period1 ;
}
mean_period_count = (period1 + period2) >> 1 ;
}
The method assumes an up-counter and requires that the counter reload runs the full 16 bit range 0 to 0xFFFF - otherwise the modulo-216 arithmetic will not work, and the solution will be much more complex. For a down-counter, swap the operands in the period calculations.
Note the return value from phasePercent() and frequencyHz() will not be valid until after a complete rising-edge to rising-edge cycle on both phases. You could add an edge count and validate after the rising edge has been seen twice on each signal if that is an issue.

This is how you sort out the technical parts for the S12:
Setup the ECT timer with appropriate pre-scaler. You seem to have this part covered? At 500kHz each timer tick is 2us and you need 65536 timer ticks to cover the worst case. One full TCNT period will be 131ms with your prescaler. If ~25Hz is the worst case, then that's roughly 40ms so it should be ok in that case.
Pick a timer channel TC that corresponds to the pin used. This channel needs to be configured as input capture, trigger on both rising/falling edge. Store the current value of TCNT as init value to a uint16_t time counter variable.
Register the ISR in the vector table etc, the usual stuff for writing an ISR.
Upon interrupt, read the value of the TCn channel register. The difference between the counter variable and the stored value gives the period time in timer cycles. Multiply this with 1/500kHz and you get the period time. In the ISR always read TC and not TCNT, as the former will not be affected by interrupt latency and code execution overhead. Update your counter variable with the value from TCn.
With a design like this ensure there are some external means of filtering out spikes and EMI from the input: RC filter or similar.

Related

Adafruit Trinket M0 (SAMD21) analog read rate slow

Why is the analog read rate seemingly slow (46 ksamples/s) when it should be fast (250 ksamples/s) for my Adafruit Trinket M0? See this simple Arduino code for details; why is PointCount only 46?
//TrinketReadRateTest
//27Nov2022
//Running on Adafruit Trinket M0, SAMD21
//Measures read times of analog reads on Trinket M0
//nothing at all connected to the Trinket
//according to the settings in this wiring.c file lines 160-173, samples per second should be = 250,000:
//C:\Users\<MyUserName>\AppData\Local\Arduino15\packages\adafruit\hardware\samd\1.7.11\cores\arduino\wiring.c
//in this loop, every PointCount is 2 samples, so in 2 millisecs, number of PointCounts should be:
//(.002 secs)(250000 samples/sec)(PointCounts/ 2 samples) = 250
//however, this routine gives a value of 46 WHY?
//if line 170 prescaler is set to DIV16 instead of DIV32, PointCounts gets to 66 (accuracy ???) so this wiring.c is being loaded
#define INPUT1 A3 //ATSAMD21G PA04
#define INPUT2 A4 //ATSAMD21G PA05
unsigned int Input1[1000];
unsigned int Input2[1000];
unsigned int PointCount = 0;
void setup() {
pinMode(INPUT1, INPUT);
pinMode(INPUT2, INPUT);
}
void loop() {
PointCount = 0;
unsigned long StartTime = micros();
do {
Input1[PointCount] = analogRead(INPUT1);
Input2[PointCount] = analogRead(INPUT2);
PointCount++;
} while (micros() - StartTime < 2000); //read 2 millisecs of data points as fast as they come
Serial.begin(9600); //keep serial off during data reads to avoid the question...
delay(1000);
Serial.println(PointCount);
Serial.end();
delay(1000);
}
I tried reading analog samples as fast as they would come. I expected to receive samples at a rate of 250000 per second. What actually resulted was a rate of 46000 samples per second.
Added 28Nov: the wiring.c file is not easy to find. If you want it:
download the tar.bz2 file:
https://adafruit.github.io/arduino-board-index/boards/adafruit-samd-1.7.11.tar.bz2
extract the tar file using 7-zip or whatever
goto cores\arduino\wiring.c
Here are the relevant lines of wiring.c:
//set to 1/(1/(48000000/32) * 6) = 250000 SPS
while(GCLK->STATUS.reg & GCLK_STATUS_SYNCBUSY);
GCLK->CLKCTRL.reg = GCLK_CLKCTRL_ID( GCM_ADC ) | // Generic Clock ADC
GCLK_CLKCTRL_GEN_GCLK0 | // Generic Clock Generator 0 is source
GCLK_CLKCTRL_CLKEN ;
while( ADC->STATUS.bit.SYNCBUSY == 1 ); // Wait for synchronization of registers between the clock domains
ADC->CTRLB.reg = ADC_CTRLB_PRESCALER_DIV32 | // Divide Clock by 32.
ADC_CTRLB_RESSEL_10BIT; // 10 bits resolution as default
ADC->SAMPCTRL.reg = 5; // Sampling Time Length
Adding this additional question 8Dec2022:
wiring.analog.c (in same folder as wiring.c) executes the analog routines. Line 369 of wiring.analog.c says the same thing that the SAMD21 data sheet says: "The first conversion after the reference is changed must not be used."
In lines 371-394, the analogRead routine for SAMD21, two reads are always made; the first to account for the statement above. But why do two reads for every analogRead? The analog reference is not changed with every read and is set prior to any reads. So why not just do one conversion after the reference is set? That way, there only needs to be one conversion per analogRead.
I moved the first conversion routine to the very end of analogReference. It speeds things up to PointCount = 79. Is this a problem? It does not seem to reduce accuracy.
Your second question is easier to answer than your first. The reason there are two ADC reads in the Arduino code is because there is a bug in the ADC hardware on the SAMD21. In the past, Arduino provided a calibration method that allowed you to correct for this instead of adding in the second read and throwing out the first garbage data. This was problematic for a number of reasons and eventually library was modified. There's an old hackaday article that provides a little more detail.
As for the ADC reads being slow, the limitation you're running into is a limitation of the SAMD library for Arduino. For reference, I am using the SAMD21 datasheet and the code from Arduino SAMD on GitHub. To start out with, the Clock speed should be 48Mhz. Using the DIV32 predivider, the ADC clock frequency is 1.5Mhz. Each ADC conversion from the SAMD21 library takes 63 clock cycles. Leaving you with ~23.8Khz. 23.8Khz * 2ms = 47.619 Conversions. Add on top of that the overhead caused by switching between the two input pins (I don't know the exact characterization but likely 1-2 clock pulses) and you'd end up with closer to 46 Conversions in 2ms.
63 clock pulses per conversion is comically high. Typically, the first read is closer to 20 pulses and subsequent ones are 13.5. There is another post on the electrical engineering Stack Exchange where someone tackles this and posts a link to their own library for improving the conversion speeds.

nand2tetris HDL: Getting error "Sub bus of an internal node may not be used"

I am trying to make a 10-bit adder/subtractor. Right now, the logic works as intended. However, I am trying to set all bits to 0 iff there is overflow. To do this, I need to pass the output (tempOut) through a 10-bit Mux, but in doing so, am getting an error.
Here is the chip:
/**
* Adds or Subtracts two 10-bit values.
* Both inputs a and b are in SIGNED 2s complement format
* when sub == 0, the chip performs add i.e. out=a+b
* when sub == 1, the chip performs subtract i.e. out=a-b
* carry reflects the overflow calculated for 10-bit add/subtract in 2s complement
*/
CHIP AddSub10 {
IN a[10], b[10], sub;
OUT out[10],carry;
PARTS:
// If sub == 1, subtraction, else addition
// First RCA4
Not4(in=b[0..3], out=notB03);
Mux4(a=b[0..3], b=notB03, sel=sub, out=MuxOneOut);
RCA4(a=a[0..3], b=MuxOneOut, cin=sub, sum=tempOut[0..3], cout=cout03);
// Second RCA4
Not4(in=b[4..7], out=notB47);
Mux4(a=b[4..7], b=notB47, sel=sub, out=MuxTwoOut);
RCA4(a=a[4..7], b=MuxTwoOut, cin=cout03, sum=tempOut[4..7], cout=cout47);
// Third RCA4
Not4(in[0..1]=b[8..9], out=notB89);
Mux4(a[0..1]=b[8..9], b=notB89, sel=sub, out=MuxThreeOut);
RCA4(a[0..1]=a[8..9], b=MuxThreeOut, cin=cout47, sum[0..1]=tempOut[8..9], sum[0]=tempA, sum[1]=tempB, sum[2]=carry);
// FIXME, intended to solve overflow/underflow
Xor(a=tempA, b=tempB, out=overflow);
Mux10(a=tempOut, b=false, sel=overflow, out=out);
}
Instead of x[a..b]=tempOut[c..d] you need to use the form x[a..b]=tempVariableAtoB (creating a new internal bus) and combine these buses in your Mux10:
Mux10(a[0..3]=temp0to3, a[4..7]=temp4to7, ... );
Without knowing what line the compiler is complaining about, it is difficult to diagnose the problem. However, my best guess is that you can't use an arbitrary internal bus like tempOut because the compiler doesn't know how big it is when it first runs into it.
The compiler knows the size of the IN and OUT elements, and it knows the size of the inputs and outputs of a component. But it can't tell how big tempOut would be without parsing everything, and that's probably outside the scope of the compiler design.
I would suggest you refactor so that each RCA4 has a discrete output bus (ie: sum1, sum2, sum3). You can then use them and their individual bits as needed in the Xor and Mux10.

Measuring Program Execution Time with Cycle Counters

I have confusion in this particular line-->
result = (double) hi * (1 << 30) * 4 + lo;
of the following code:
void access_counter(unsigned *hi, unsigned *lo)
// Set *hi and *lo to the high and low order bits of the cycle
// counter.
{
asm("rdtscp; movl %%edx,%0; movl %%eax,%1" // Read cycle counter
: "=r" (*hi), "=r" (*lo) // and move results to
: /* No input */ // the two outputs
: "%edx", "%eax");
}
double get_counter()
// Return the number of cycles since the last call to start_counter.
{
unsigned ncyc_hi, ncyc_lo;
unsigned hi, lo, borrow;
double result;
/* Get cycle counter */
access_counter(&ncyc_hi, &ncyc_lo);
lo = ncyc_lo - cyc_lo;
borrow = lo > ncyc_lo;
hi = ncyc_hi - cyc_hi - borrow;
result = (double) hi * (1 << 30) * 4 + lo;
if (result < 0) {
fprintf(stderr, "Error: counter returns neg value: %.0f\n", result);
}
return result;
}
The thing I cannot understand is that why is hi being multiplied with 2^30 and then 4? and then low added to it? Someone please explain what is happening in this line of code. I do know that what hi and low contain.
The short answer:
That line turns a 64bit integer that is stored as 2 32bit values into a floating point number.
Why doesn't the code just use a 64bit integer? Well, gcc has supported 64bit numbers for a long time, but presumably this code predates that. In that case, the only way to support numbers that big is to put them into a floating point number.
The long answer:
First, you need to understand how rdtscp works. When this assembler instruction is invoked, it does 2 things:
1) Sets ecx to IA32_TSC_AUX MSR. In my experience, this generally just means ecx gets set to zero.
2) Sets edx:eax to the current value of the processor’s time-stamp counter. This means that the lower 64bits of the counter go into eax, and the upper 32bits are in edx.
With that in mind, let's look at the code. When called from get_counter, access_counter is going to put edx in 'ncyc_hi' and eax in 'ncyc_lo.' Then get_counter is going to do:
lo = ncyc_lo - cyc_lo;
borrow = lo > ncyc_lo;
hi = ncyc_hi - cyc_hi - borrow;
What does this do?
Since the time is stored in 2 different 32bit numbers, if we want to find out how much time has elapsed, we need to do a bit of work to find the difference between the old time and the new. When it is done, the result is stored (again, using 2 32bit numbers) in hi / lo.
Which finally brings us to your question.
result = (double) hi * (1 << 30) * 4 + lo;
If we could use 64bit integers, converting 2 32bit values to a single 64bit value would look like this:
unsigned long long result = hi; // put hi into the 64bit number.
result <<= 32; // shift the 32 bits to the upper part of the number
results |= low; // add in the lower 32bits.
If you aren't used to bit shifting, maybe looking at it like this will help. If lo = 1 and high = 2, then expressed as hex numbers:
result = hi; 0x0000000000000002
result <<= 32; 0x0000000200000000
result |= low; 0x0000000200000001
But if we assume the compiler doesn't support 64bit integers, that won't work. While floating point numbers can hold values that big, they don't support shifting. So we need to figure out a way to shift 'hi' left by 32bits, without using left shift.
Ok then, shifting left by 1 is really the same as multiplying by 2. Shifting left by 2 is the same as multiplying by 4. Shifting left by [omitted...] Shifting left by 32 is the same as multiplying by 4,294,967,296.
By an amazing coincidence, 4,294,967,296 == (1 << 30) * 4.
So why write it in that complicated fashion? Well, 4,294,967,296 is a pretty big number. In fact, it's too big to fit in an 32bit integer. Which means if we put it in our source code, a compiler that doesn't support 64bit integers may have trouble figuring out how to process it. Written like this, the compiler can generate whatever floating point instructions it might need to work on that really big number.
Why the current code is wrong:
It looks like variations of this code have been wandering around the internet for a long time. Originally (I assume) access_counter was written using rdtsc instead of rdtscp. I'm not going to try to describe the difference between the two (google them), other than to point out that rdtsc does not set ecx, and rdtscp does. Whoever changed rdtsc to rdtscp apparently didn't know that, and failed to adjust the inline assembler stuff to reflect it. While your code might work fine despite this, it might do something weird instead. To fix it, you could do:
asm("rdtscp; movl %%edx,%0; movl %%eax,%1" // Read cycle counter
: "=r" (*hi), "=r" (*lo) // and move results to
: /* No input */ // the two outputs
: "%edx", "%eax", "%ecx");
While this will work, it isn't optimal. Registers are a valuable and scarce resource on i386. This tiny fragment uses 5 of them. With a slight modification:
asm("rdtscp" // Read cycle counter
: "=d" (*hi), "=a" (*lo)
: /* No input */
: "%ecx");
Now we have 2 fewer assembly statements, and we only use 3 registers.
But even that isn't the best we can do. In the (presumably long) time since this code was written, gcc has added both support for 64bit integers and a function to read the tsc, so you don't need to use asm at all:
unsigned int a;
unsigned long long result;
result = __builtin_ia32_rdtscp(&a);
'a' is the (useless?) value that was being returned in ecx. The function call requires it, but we can just ignore the returned value.
So, instead of doing something like this (which I assume your existing code does):
unsigned cyc_hi, cyc_lo;
access_counter(&cyc_hi, &cyc_lo);
// do something
double elapsed_time = get_counter(); // Find the difference between cyc_hi, cyc_lo and the current time
We can do:
unsigned int a;
unsigned long long before, after;
before = __builtin_ia32_rdtscp(&a);
// do something
after = __builtin_ia32_rdtscp(&a);
unsigned long long elapsed_time = after - before;
This is shorter, doesn't use hard-to-understand assembler, is easier to read, maintain and produces the best possible code.
But it does require a relatively recent version of gcc.

clock pulse generator for a PLC

I am working with PLCs trying to design a water tank. On one section of the design I am asked to create a clock pulse generator. I am currently trying to do this using ladder diagrams.
I believe I have the logic correct just cant seem to put it together. I want a counter to count the clock pulses that I generate, then I store these pulese in a data memory to ensure the count is retained if the system is switched off and on.
question is how do I design this clock pulse generator.
Kind regards
There are a few different ways to create a pulse generator (or commonly known in the plc world as a BLINK timer). As a matter of fact many plc programming softwares have this function block built in to their function block libraries. But if they don't or you just want to make your own you can do something like this
VAR
ton1: TON;
ton2: TON;
StartPulse: BOOL;
startPulseTrig: R_TRIG;
LatchPulseInitial: BOOL;
PulseOutput: BOOL;
Timer1Done: BOOL;
Timer2Done: BOOL;
PulseWidth:TIME:=t#500ms;
END_VAR
If you would like to count the number of pulses and store this value to a variable you can use a simple CTU (counter up) block available in all plc languages.
Review of functionality
The StartPulse variable can be anything you want that will start the counter. In my case I just used an internal bool variable that I turned on. If you want this timer to start automatically when the plc starts then just initialize this variable to true. Because StartPulse only works on the rising edge of the signal the LatchPulseInitial coil will only ever be set once.
When the LatchPulseInitial variable goes true this will start ton1 a Timer On Delay (TON) function block. This will delay the output of the block from turning on for the time of PT (in my case I have 500ms).
After 500ms has expired the ton1 outputs will turn on. this will turn on the Timer1Done variable and turn off the Timer2Done and LatchPulseInitial. Now that LatchPulseInitial has been turned off it will never interfere with the program again since it can only be turned on by the rising edge of StartPulse. Note: once the block has reached PT the outputs will stay on until the input to the block is removed.
Since Timer1Done is now on ton2 will start counting until PT is reached. Once PT is reached the outputs for that block will turn on. This will reset Timer1Done and set Timer2Done This will start ton1 again and thus starting the whole process over.
For the PulseOutput, which is the actual pulse output you are looking for, I have this being set to true when Timer2Done is true. This is because when this variable is true it is the high state of the pulse generator. When Timer1Done is true it is the low state of the pulse generator.
When the PulseOutput goes true it will trigger the input on the CTU which will increment the count of the variable in CV (current value) by 1.
If you are going to be using this logic in numerous places in your program or you plan on reusing it in the future I would make this into its own function block so you won't have to repeat this logic everytime you want to make this type of timer.
Once I had to create a BLINK FB. It is written in Structured Text. But it is suitable to use in a ladder logic program and IN/OUT Variables are named like TON style. The Blink starts with Q = TRUE. If you want to start with FALSE just invert Q and switch the Times!
FUNCTION_BLOCK BLINK
VAR_INPUT
IN : BOOL;
PT_ON : TIME;
PT_OFF : TIME;
END_VAR
VAR_OUTPUT
Q : BOOL;
ET : TIME;
END_VAR
VAR
rtIN : R_TRIG;
tonBlink : TON;
END_VAR
rtIN(CLK := IN);
IF tonBlink.Q OR rtIN.Q THEN
(*Toggle Output*)
Q := NOT Q;
(*Timer Reset call, important to call timer twice in same cycle for correct Blink Time*)
tonBlink(IN:= FALSE);
(*Set corresponding Time*)
IF Q THEN
tonBlink.PT := PT_ON;
ELSE
tonBlink.PT := PT_OFF;
END_IF
END_IF
(*Timer Run call*)
tonBlink(IN:= IN);
IF IN THEN
ET := tonBlink.ET;
ELSE
ET := T#0S;
Q := FALSE;
END_IF
In my opinion, this is the most straightforward way to do it, using 1 timer, up counter and modulo operator:
Blink function in ladder
Also note, if your PLC doesnt have modulo, then multiply by -1 each time.

How I can fix this code to allow my AVR to talk over serial port?

I've been pulling my hair out lately trying to get an ATmega162 on my STK200 to talk to my computer over RS232. I checked and made sure that the STK200 contains a MAX202CPE chip.
I've configured the chip to use its internal 8MHz clock and divided it by 8.
I've tried to copy the code out of the data sheet (and made changes where the compiler complained), but to no avail.
My code is below, could someone please help me fix the problems that I'm having?
I've confirmed that my serial port works on other devices and is not faulty.
Thanks!
#include <avr/io.h>
#include <avr/iom162.h>
#define BAUDRATE 4800
void USART_Init(unsigned int baud)
{
UBRR0H = (unsigned char)(baud >> 8);
UBRR0L = (unsigned char)baud;
UCSR0B = (1 << RXEN0) | (1 << TXEN0);
UCSR0C = (1 << URSEL0) | (1 << USBS0) | (3 << UCSZ00);
}
void USART_Transmit(unsigned char data)
{
while(!(UCSR0A & (1 << UDRE0)));
UDR0 = data;
}
unsigned char USART_Receive()
{
while(!(UCSR0A & (1 << RXC0)));
return UDR0;
}
int main()
{
USART_Init(BAUDRATE);
unsigned char data;
// all are 1, all as output
DDRB = 0xFF;
while(1)
{
data = USART_Receive();
PORTB = data;
USART_Transmit(data);
}
}
I have commented on Greg's answer, but would like to add one more thing. For this sort of problem the gold standard method of debugging it is to first understand asynchronous serial communications, then to get an oscilloscope and see what's happening on the line. If characters are being exchanged and it's just a baudrate problem this will be particularly helpful as you can calculate the baudrate you are seeing and then adjust the divisor accordingly.
Here is a super quick primer, no doubt you can find something much more comprehensive on Wikipedia or elsewhere.
Let's assume 8 bits, no parity, 1 stop bit (the most common setup). Then if the character being transmitted is say 0x3f (= ascii '?'), then the line looks like this;
...--+ +---+---+---+---+---+---+ +---+--...
| S | 1 1 1 1 1 1 | 0 0 | E
+---+ +---+---+
The high (1) level is +5V at the chip and -12V after conversion to RS232 levels.
The low (0) level is 0V at the chip and +12V after conversion to RS232 levels.
S is the start bit.
Then we have 8 data bits, least significant first, so here 00111111 = 0x3f = '?'.
E is the stop (e for end) bit.
Time is advancing from left to right, just like an oscilloscope display, If the baudrate is 4800, then each bit spans (1/4800) seconds = 0.21 milliseconds (approx).
The receiver works by sampling the line and looking for a falling edge (a quiescent line is simply logical '1' all the time). The receiver knows the baudrate, and the number of start bits (1), so it measures one half bit time from the falling edge to find the middle of the start bit, then samples the line 8 bit times in succession after that to collect the data bits. The receiver then waits one more bit time (until half way through the stop bit) and starts looking for another start bit (i.e. falling edge). Meanwhile the character read is made available to the rest of the system. The transmitter guarantees that the next falling edge won't begin until the stop bit is complete. The transmitter can be programmed to always wait longer (with additional stop bits) but that is a legacy issue, extra stop bits were only required with very slow hardware and/or software setups.
I don't have reference material handy, but the baud rate register UBRR usually contains a divisor value, rather than the desired baud rate itself. A quick google search indicates that the correct divisor value for 4800 baud may be 239. So try:
divisor = 239;
UBRR0H = (unsigned char)(divisor >> 8);
UBRR0L = (unsigned char)divisor;
If this doesn't work, check with the reference docs for your particular chip for the correct divisor calculation formula.
For debugging UART communication, there are two useful things to do:
1) Do a loop-back at the connector and make sure you can read back what you write. If you send a character and get it back exactly, you know that the hardware is wired correctly, and that at least the basic set of UART register configuration is correct.
2) Repeatedly send the character 0x55 ("U") - the binary bit pattern 01010101 will allow you to quickly see the bit width on the oscilloscope, which will let you verify that the speed setting is correct.
After reading the data sheet a little more thoroughly, I was incorrectly setting the baudrate. The ATmega162 data sheet had a chart of clock frequencies plotted against baud rates and the corresponding error.
For a 4800 baud rate and a 1 MHz clock frequency, the error was 0.2%, which was acceptable for me. The trick was passing 12 to the USART_Init() function, instead of 4800.
Hope this helps someone else out!