Getting fuel% from analog data - embedded

I am getting analog voltage data, in mV, from a fuel gauge. The calibration readings were taken for every 10% change in the fuel gauge as mentioned below :
0% - 2000mV
10% - 2100mV
20% - 3200mV
30% - 3645mV
40% - 3755mV
50% - 3922mV
60% - 4300mV
70% - 4500mv
80% - 5210mV
90% - 5400mV
100% - 5800mV
The tank capacity is 45L.
Post calibration, I am getting reading from adc as let's say, 3000mV. How to calculate the exact % of fuel left in the tank?

If you plot the transfer function of ADC reading agaist the percentage tank contents you get a graph like this
There appears to be a fair degree of non linearity in the relationship between the sensor and the measured quantity. This could be down to a measurement error that was made while performing the calibration or it could be a true non linear relationship between the sensor reading and the tank contents. Using these results will give fairly inaccurate estimates of tank contents due to the non linearity of the transfer function.
If the relationship is linear or can be described by another mathematical relationship then you can perform an interpolation between known points using this mathematical relationship.
If the relationship is not linear than you will need many more known points in your calibration data so that the errors due to the interpolation between points is minimised.
The percentage value corresponding to the ADC reading can be approximated by finding the entries in the calibration above and below the reading that has been taken - for the ADC reading example in the question these would be the 10% and 20% values
Interpolation_Proportion = (ADC - ADC_Below) / (ADC_Above - ADC_Below) ;
Percent = Percent_Below + (Interpolation_Proportion * (Percent_Above - Percent_Below)) ;
.
Interpolation proportion = (3000-2100)/(3200-2100)
= 900/1100
= 0.82
Percent = 10 + (0.82 * (20 - 10)
= 10 + 8.2
= 18.2%
Capacity = 45 * 18.2 / 100
= 8.19 litres

When plotted it appears that the data id broadly linear, with some outliers. It is likely that this is experimental error or possibly influenced by confounding factors such as electrical noise or temperature variation, or even just the the liquid slopping around! Without details of how the data was gathered and how carefully, it is not possible to determine, but I would ask how many samples were taken per measurement, whether these are averaged or instantaneous and whether the results are exactly repeatable over more than one experiment?
Assuming the results are "indicative" only, then it is probably wisest from the data you do have to assume that the transfer function is linear, and to perform a linear regression from the scatter plot of your test data. That can be most done easily using any spreadsheet charting "trendline" function:
From your date the transfer function is:
Fuel% = (0.0262 x SensormV) - 54.5
So for your example 3000mV, Fuel% = (0.0262 x 3000) - 54.5 = 24.1%
For your 45L tank that equates to about 10.8 Litres.

Related

What is the difference between temperature sensor characteristic vs. sensor calibration

I'm configuring the internal temperature sensor of a STM32F4 MCU, and when reading the documentation, I came across some "duplicated" but divergent definitions.
Take a look in the image below:
The temperature sensor is connected to the ADC1, channel 16. When reading the ADC value inside my room, I always get values around ~920;
The values for the calibrations (read from MCU memory) are the following:
TS_CAL1 = 941
TS_CAL2 = 1199
It seems to me that calculating the final temperature using the relationship shown on Table 69 leads to different results from when using the relationship from Table 70.
Can anyone help me in this topic? What's the difference between the data of Tables 69 and 70? What is the purpose of each one? How to calculate the temperature correctly?
As Clifford explained in the comments, the information in table 69 tells you the typical behaviour of any device from this family, whereas the pointers in table 70 give you the address of the calibration data for your particular device which were measured in the factory.
If you told me that some device of this type gave a reading of 920, I would estimate the temperature as follows:
ADC voltage = 920/4096 * 3.3V = 741mV
Voltage offset from V(25C) = 741mV - 760mV = -19mV
Temperature offset from 25C = -19/2.5 = 7.6C
Temperature = 25 - 7.6 = 17.4 degrees C
For your calibrated device I would estimate the temperature like this:
Slope = (1199 - 941) / (110 - 30) = 3.225 LSB/degree
ADC offset from ADC(30C) = (920 - 941) = -21 LSB
Temperature offset from 30C = -21 / 3.225 = 17.775 C
Temperature = 30 - 17.775 = 12.2 degrees C
It is important to note however that although this second number is "calibrated", it is done so using calibration data from much higher temperatures. To use it below 30 degrees requires to extrapolate in a way which may not be physically valid.
Ask yourself, was the room closer to 17 degrees or 12 degrees? Bear in mind that the internal temperature sensor is probably subject to a certain amount of self-heating from the high performance processor.
If you want to use the internal temperature sensor to measure low temperatures outside the calibration range like this it might be appropriate to use the offset from the lower calibration point, but then use the typical slope from the datasheet.
Note also that many STM32 evaluation boards run at 3.0V not 3.3V, so all the calculation will have to be changed if that is the case.
I can not see the table but in theory there are following points need to considers when working with sensor:
Offset: value that is different with the actual value. You may check by feed a constant voltage to system and compare with ADC value measured.
Error of sensor: you might need to measure at known value. For example, for temperature, it is used to measure at 0 temperature which is a steady state.

sampling 2-dimensional surface: how many sample points along X & Y axes?

I have a set of first 25 Zernike polynomials. Below are shown few in Cartesin co-ordinate system.
z2 = 2*x
z3 = 2*y
z4 = sqrt(3)*(2*x^2+2*y^2-1)
:
:
z24 = sqrt(14)*(15*(x^2+y^2)^2-20*(x^2+y^2)+6)*(x^2-y^2)
I am not using 1st since it is piston; so I have these 24 two-dim ANALYTICAL functions expressed in X-Y Cartesian co-ordinate system. All are defined over unit circle, as they are orthogonal over unit circle. The problem which I am describing here is relevant to other 2D surfaces also apart from Zernike Polynomials.
Suppose that origin (0,0) of the XY co-ordinate system and the centre of the unit circle are same.
Next, I take linear combination of these 24 polynomials to build a 2D wavefront shape. I use 24 random input coefficients in this combination.
w(x,y) = sum_over_i a_i*z_i (i=2,3,4,....24)
a_i = random coefficients
z_i = zernike polynomials
Upto this point, everything is analytical part which can be done on paper.
Now comes the discretization!
I know that when you want to re-construct a signal (1Dim/2Dim), your sampling frequency should be at least twice the maximum frequency present in the signal (Nyquist-Shanon principle).
Here signal is w(x,y) as mentioned above which is nothing but a simple 2Dim
function of x & y. I want to represent it on computer now. Obviously I can not take all infinite points from -1 to +1 along x axis and same for y axis.
I have to take finite no. of data points (which are called sample points or just samples) on this analytical 2Dim surface w(x,y)
I am measuring x & y in metres, and -1 <= x <= +1; -1 <= y <= +1.
e.g. If I divide my x-axis from -1 to 1, in 50 sample points then dx = 2/50= 0.04 metre. Same for y axis. Now my sampling frequency is 1/dx i.e. 25 samples per metre. Same for y axis.
But I took 50 samples arbitrarily; I could have taken 10 samples or 1000 samples. That is the crux of the matter here: how many samples points?How will I determine this number?
There is one theorem (Nyquist-Shanon theorem) mentioned above which says that if I want to re-construct w(x,y) faithfully, I must sample it on both axes so that my sampling frequency (i.e. no. of samples per metre) is at least twice the maximum frequency present in the w(x,y). This is nothing but finding power spectrum of w(x,y). Idea is that any function in space domain can be represented in spatial-frequency domain also, which is nothing but taking Fourier transform of the function! This tells us how many (spatial) frequencies are present in your function w(x,y) and what is the maximum frequency out of these many frequencies.
Now my question is first how to find out this maximum sampling frequency in my case. I can not use MATLAB fft2() or any other tool since it means already I have samples taken across the wavefront!! Obviously remaining option is find it analytically ! But that is time consuming and difficult since I have 24 polynomials & I will have to use then continuous Fourier transform i.e. I will have to go for pen and paper.
Any help will be appreciated.
Thanks
Key Assumptions
You want to use the "Nyquist-Shanon" theorem to determine sampling frequency
Obviously remaining option is find it analytically ! But that is time
consuming and difficult since I have 21 polynomials & I have to use
continuous Fourier transform i.e. done by analytically.
Given the assumption I have made (and noting that consideration of other mathematical techniques is out of scope for StackOverflow), you have no option but to calculate the continuous Fourier Transform.
However, I believe you haven't considered all the options for calculating the transform other than a laborious paper exercise e.g.
Numerical approximation of the continuous F.T. using code
Symbolic Integration e.g. Wolfram Alpha
Surely a numerical approximation of the Fourier Transform will be adequate for your solution?
I am assuming this is for coursework or research rather, so all you really care about as a physicist is a solution that is the quickest solution that is accurate within the scope of your problem.
So to conclude, IMHO, don't waste time searching for a more mathematically elegant solution or trick and just solve the problem with one of the above methods

iOS - High pass filter equation for accelerometer

Could someone explain how one arrives at the equation below for high pass filtering of the accelerometer values? I don't need mathematical derivation, just an intuitive interpretation of it is enough.
#define kFilteringFactor 0.1
UIAccelerationValue rollingX, rollingY, rollingZ;
- (void)accelerometer:(UIAccelerometer *)accelerometer didAccelerate:(UIAcceleration *)acceleration {
// Subtract the low-pass value from the current value to get a simplified high-pass filter
rollingX = (acceleration.x * kFilteringFactor) + (rollingX * (1.0 - kFilteringFactor));
rollingY = (acceleration.y * kFilteringFactor) + (rollingY * (1.0 - kFilteringFactor));
rollingZ = (acceleration.z * kFilteringFactor) + (rollingZ * (1.0 - kFilteringFactor));
float accelX = acceleration.x - rollingX;
float accelY = acceleration.y - rollingY;
float accelZ = acceleration.z - rollingZ;
// Use the acceleration data.
}
While the other answers are correct, here is an simplistic explanation. With kFilteringFactor 0.1 you are taking 10% of the current value and adding 90% of the previous value. Therefore the value retains a 90% similarity to the previous value, which increases its resistance to sudden changes. This decreases noise but it also makes it less responsive to changes in the signal. To reduce noise and keep it responsive you would need non trivial filters, eg: Complementary, Kalman.
The rollingX, rollingY, rollingZ values are persistent across calls to the function. They should be initialised at some point prior to use. These "rolling" values are just low pass filtered versions of the input values (aka "moving averages") which are subtracted from the instantaneous values to give you a high pass filtered output, i.e. you are getting the current deviation from the moving average.
ADDITIONAL EXPLANATION
A moving average is just a crude low pass filter. In this case it's what is known as ARMA (auto-regressive moving average) rather than just a simple MA (moving average). In DSP terms this is a recursive (IIR) filter rather than a non-recursive (FIR) filter. Regardless of all the terminology though, yes, you can think of it as a smoothing function" - it's "smoothing out" all the high frequency energy and leaving you with a slowly varying estimate of the mean value of the signal. If you then subtract this smoothed signal from the instantaneous signal then the difference will be the content that you have filtered out, i.e. the high frequency stuff, hence you get a high pass filter. In other words: high_pass_filtered_signal = signal - smoothed_signal.
Okay, what that code is doing is calculating a low-pass signal and then substracting the current value.
Thing of a square wave that takes two values 5 and 10. In other words, it oscillates between 5 and 10. Then the low pass signal is trying to find the mean (7.5). The high-pass signal is then calculated as current value minus mean, i.e. 10 - 7.5 = 2.5, or 5 - 7.5 = -2.5.
The low-pass signal is computed by integrating over past values by adding a fraction of the current value to 90% of the past low-pass value.

How to calculate deceleration needed to reach a certain speed over a certain distance?

I've tried the typical physics equations for this but none of them really work because the equations deal with constant acceleration and mine will need to change to work correctly. Basically I have a car that can be going at a large range of speeds and needs to slow down and stop over a given distance and time as it reaches the end of its path.
So, I have:
V0, or the current speed
Vf, or the speed I want to reach (typically 0)
t, or the amount of time I want to take to reach the end of my path
d, or the distance I want to go as I change from V0 to Vf
I want to calculate
a, or the acceleration needed to go from V0 to Vf
The reason this becomes a programming-specific question is because a needs to be recalculated every single timestep as the car keeps stopping. So, V0 constantly is changed to be V0 from last timestep plus the a that was calculated last timestep. So essentially it will start stopping slowly then will eventually stop more abruptly, sort of like a car in real life.
EDITS:
All right, thanks for the great responses. A lot of what I needed was just some help thinking about this. Let me be more specific now that I've got some more ideas from you all:
I have a car c that is 64 pixels from its destination, so d=64. It is driving at 2 pixels per timestep, where a timestep is 1/60 of a second. I want to find the acceleration a that will bring it to a speed of 0.2 pixels per timestep by the time it has traveled d.
d = 64 //distance
V0 = 2 //initial velocity (in ppt)
Vf = 0.2 //final velocity (in ppt)
Also because this happens in a game loop, a variable delta is passed through to each action, which is the multiple of 1/60s that the last timestep took. In other words, if it took 1/60s, then delta is 1.0, if it took 1/30s, then delta is 0.5. Before acceleration is actually applied, it is multiplied by this delta value. Similarly, before the car moves again its velocity is multiplied by the delta value. This is pretty standard stuff, but it might be what is causing problems with my calculations.
Linear acceleration a for a distance d going from a starting speed Vi to a final speed Vf:
a = (Vf*Vf - Vi*Vi)/(2 * d)
EDIT:
After your edit, let me try and gauge what you need...
If you take this formula and insert your numbers, you get a constant acceleration of -0,0309375. Now, let's keep calling this result 'a'.
What you need between timestamps (frames?) is not actually the acceleration, but new location of the vehicle, right? So you use the following formula:
Sd = Vi * t + 0.5 * t * t * a
where Sd is the current distance from the start position at current frame/moment/sum_of_deltas, Vi is the starting speed, and t is the time since the start.
With this, your decceleration is constant, but even if it is linear, your speed will accomodate to your constraints.
If you want a non-linear decceleration, you could find some non-linear interpolation method, and interpolate not acceleration, but simply position between two points.
location = non_linear_function(time);
The four constraints you give are one too many for a linear system (one with constant acceleration), where any three of the variables would suffice to compute the acceleration and thereby determine the fourth variables. However, the system is way under-specified for a completely general nonlinear system -- there may be uncountably infinite ways to change acceleration over time while satisfying all the constraints as given. Can you perhaps specify better along what kind of curve acceleration should change over time?
Using 0 index to mean "at the start", 1 to mean "at the end", and D for Delta to mean "variation", given a linearly changing acceleration
a(t) = a0 + t * (a1-a0)/Dt
where a0 and a1 are the two parameters we want to compute to satisfy all the various constraints, I compute (if there's been no misstep, as I did it all by hand):
DV = Dt * (a0+a1)/2
Ds = Dt * (V0 + ((a1-a0)/6 + a0/2) * Dt)
Given DV, Dt and Ds are all given, this leaves 2 linear equations in the unknowns a0 and a1 so you can solve for these (but I'm leaving things in this form to make it easier to double check on my derivations!!!).
If you're applying the proper formulas at every step to compute changes in space and velocity, it should make no difference whether you compute a0 and a1 once and for all or recompute them at every step based on the remaining Dt, Ds and DV.
If you're trying to simulate a time-dependent acceleration in your equations, it just means that you should assume that. You have to integrate F = ma along with the acceleration equations, that's all. If acceleration isn't constant, you just have to solve a system of equations instead of just one.
So now it's really three vector equations that you have to integrate simultaneously: one for each component of displacement, velocity, and acceleration, or nine equations in total. The force as a function of time will be an input for your problem.
If you're assuming 1D motion you're down to three simultaneous equations. The ones for velocity and displacement are both pretty easy.
In real life, a car's stopping ability depends on the pressure on the brake pedal, any engine braking that's going on, surface conditions, and such: also, there's that "grab" at the end when the car really stops. Modeling that is complicated, and you're unlikely to find good answers on a programming website. Find some automotive engineers.
Aside from that, I don't know what you're asking for. Are you trying to determine a braking schedule? As in there's a certain amount of deceleration while coasting, and then applying the brake? In real driving, the time is not usually considered in these maneuvers, but rather the distance.
As far as I can tell, your problem is that you aren't asking for anything specific, which suggests that you really haven't figured out what you actually want. If you'd provide a sample use for this, we could probably help you. As it is, you've provided the bare bones of a problem that is either overdetermined or way underconstrained, and there's really nothing we can do with that.
if you need to go from 10m/s to 0m/s in 1m with linear acceleration you need 2 equations.
first find the time (t) it takes to stop.
v0 = initial velocity
vf = final velocity
x0 = initial displacement
xf = final displacement
a = constant linear acceleration
(xf-x0)=.5*(v0-vf)*t
t=2*(xf-x0)/(v0-vf)
t=2*(1m-0m)/(10m/s-0m/s)
t=.2seconds
next to calculate the linear acceleration between x0 & xf
(xf-x0)=(v0-vf)*t+.5*a*t^2
(1m-0m)=(10m/s-0m/s)*(.2s)+.5*a*((.2s)^2)
1m=(10m/s)*(.2s)+.5*a*(.04s^2)
1m=2m+a*(.02s^2)
-1m=a*(.02s^2)
a=-1m/(.02s^2)
a=-50m/s^2
in terms of gravity (g's)
a=(-50m/s^2)/(9.8m/s^2)
a=5.1g over the .2 seconds from 0m to 10m
Problem is either overconstrained or underconstrained (a is not constant? is there a maximum a?) or ambiguous.
Simplest formula would be a=(Vf-V0)/t
Edit: if time is not constrained, and distance s is constrained, and acceleration is constant, then the relevant formulae are s = (Vf+V0)/2 * t, t=(Vf-V0)/a which simplifies to a = (Vf2 - V02) / (2s).

VB FFT - stuck understanding relationship of results to frequency

Trying to understand an fft (Fast Fourier Transform) routine I'm using (stealing)(recycling)
Input is an array of 512 data points which are a sample waveform.
Test data is generated into this array. fft transforms this array into frequency domain.
Trying to understand relationship between freq, period, sample rate and position in fft array. I'll illustrate with examples:
========================================
Sample rate is 1000 samples/s.
Generate a set of samples at 10Hz.
Input array has peak values at arr(28), arr(128), arr(228) ...
period = 100 sample points
peak value in fft array is at index 6 (excluding a huge value at 0)
========================================
Sample rate is 8000 samples/s
Generate set of samples at 440Hz
Input array peak values include arr(7), arr(25), arr(43), arr(61) ...
period = 18 sample points
peak value in fft array is at index 29 (excluding a huge value at 0)
========================================
How do I relate the index of the peak in the fft array to frequency ?
If you ignore the imaginary part, the frequency distribution is linear across bins:
Frequency#i = (Sampling rate/2)*(i/Nbins).
So for your first example, assumming you had 256 bins, the largest bin corresponds to a frequency of 1000/2 * 6/256 = 11.7 Hz.
Since your input was 10Hz, I'd guess that bin 5 (9.7Hz) also had a big component.
To get better accuracy, you need to take more samples, to get smaller bins.
Your second example gives 8000/2*29/256 = 453Hz. Again, close, but you need more bins.
Your resolution here is only 4000/256 = 15.6Hz.
It would be helpful if you were to provide your sample dataset.
My guess would be that you have what are called sampling artifacts. The strong signal at DC ( frequency 0 ) suggests that this is the case.
You should always ensure that the average value in your input data is zero - find the average and subtract it from each sample point before invoking the fft is good practice.
Along the same lines, you have to be careful about the sampling window artifact. It is important that the first and last data point are close to zero because otherwise the "step" from outside to inside the sampling window has the effect of injecting a whole lot of energy at different frequencies.
The bottom line is that doing an fft analysis requires more care than simply recycling a fft routine found somewhere.
Here are the first 100 sample points of a 10Hz signal as described in the question, massaged to avoid sampling artifacts
> sinx[1:100]
[1] 0.000000e+00 6.279052e-02 1.253332e-01 1.873813e-01 2.486899e-01 3.090170e-01 3.681246e-01 4.257793e-01 4.817537e-01 5.358268e-01
[11] 5.877853e-01 6.374240e-01 6.845471e-01 7.289686e-01 7.705132e-01 8.090170e-01 8.443279e-01 8.763067e-01 9.048271e-01 9.297765e-01
[21] 9.510565e-01 9.685832e-01 9.822873e-01 9.921147e-01 9.980267e-01 1.000000e+00 9.980267e-01 9.921147e-01 9.822873e-01 9.685832e-01
[31] 9.510565e-01 9.297765e-01 9.048271e-01 8.763067e-01 8.443279e-01 8.090170e-01 7.705132e-01 7.289686e-01 6.845471e-01 6.374240e-01
[41] 5.877853e-01 5.358268e-01 4.817537e-01 4.257793e-01 3.681246e-01 3.090170e-01 2.486899e-01 1.873813e-01 1.253332e-01 6.279052e-02
[51] -2.542075e-15 -6.279052e-02 -1.253332e-01 -1.873813e-01 -2.486899e-01 -3.090170e-01 -3.681246e-01 -4.257793e-01 -4.817537e-01 -5.358268e-01
[61] -5.877853e-01 -6.374240e-01 -6.845471e-01 -7.289686e-01 -7.705132e-01 -8.090170e-01 -8.443279e-01 -8.763067e-01 -9.048271e-01 -9.297765e-01
[71] -9.510565e-01 -9.685832e-01 -9.822873e-01 -9.921147e-01 -9.980267e-01 -1.000000e+00 -9.980267e-01 -9.921147e-01 -9.822873e-01 -9.685832e-01
[81] -9.510565e-01 -9.297765e-01 -9.048271e-01 -8.763067e-01 -8.443279e-01 -8.090170e-01 -7.705132e-01 -7.289686e-01 -6.845471e-01 -6.374240e-01
[91] -5.877853e-01 -5.358268e-01 -4.817537e-01 -4.257793e-01 -3.681246e-01 -3.090170e-01 -2.486899e-01 -1.873813e-01 -1.253332e-01 -6.279052e-02
And here is the resulting absolute values of the fft frequency domain
[1] 7.160038e-13 1.008741e-01 2.080408e-01 3.291725e-01 4.753899e-01 6.653660e-01 9.352601e-01 1.368212e+00 2.211653e+00 4.691243e+00 5.001674e+02
[12] 5.293086e+00 2.742218e+00 1.891330e+00 1.462830e+00 1.203175e+00 1.028079e+00 9.014559e-01 8.052577e-01 7.294489e-01
I'm a little rusty too on math and signal processing but with the additional info I can give it a shot.
If you want to know the signal energy per bin you need the magnitude of the complex output. So just looking at the real output is not enough. Even when the input is only real numbers. For every bin the magnitude of the output is sqrt(real^2 + imag^2), just like pythagoras :-)
bins 0 to 449 are positive frequencies from 0 Hz to 500 Hz. bins 500 to 1000 are negative frequencies and should be the same as the positive for a real signal. If you process one buffer every second frequencies and array indices line up nicely. So the peak at index 6 corresponds with 6Hz so that's a bit strange. This might be because you're only looking at the real output data and the real and imaginary data combine to give an expected peak at index 10. The frequencies should map linearly to the bins.
The peaks at 0 indicates a DC offset.
It's been some time since I've done FFT's but here's what I remember
FFT usually takes complex numbers as input and output. So I'm not really sure how the real and imaginary part of the input and output map to the arrays.
I don't really understand what you're doing. In the first example you say you process sample buffers at 10Hz for a sample rate of 1000 Hz? So you should have 10 buffers per second with 100 samples each. I don't get how your input array can be at least 228 samples long.
Usually the first half of the output buffer are frequency bins from 0 frequency (=dc offset) to 1/2 sample rate. and the 2nd half are negative frequencies. if your input is only real data with 0 for the imaginary signal positive and negative frequencies are the same. The relationship of real/imaginary signal on the output contains phase information from your input signal.
The frequency for bin i is i * (samplerate / n), where n is the number of samples in the FFT's input window.
If you're handling audio, since pitch is proportional to log of frequency, the pitch resolution of the bins increases as the frequency does -- it's hard to resolve low frequency signals accurately. To do so you need to use larger FFT windows, which reduces time resolution. There is a tradeoff of frequency against time resolution for a given sample rate.
You mention a bin with a large value at 0 -- this is the bin with frequency 0, i.e. the DC component. If this is large, then presumably your values are generally positive. Bin n/2 (in your case 256) is the Nyquist frequency, half the sample rate, which is the highest frequency that can be resolved in the sampled signal at this rate.
If the signal is real, then bins n/2+1 to n-1 will contain the complex conjugates of bins n/2-1 to 1, respectively. The DC value only appears once.
The samples are, as others have said, equally spaced in the frequency domain (not logarithmic).
For example 1, you should get this:
alt text http://home.comcast.net/~kootsoop/images/SINE1.jpg
For the other example you should get
alt text http://home.comcast.net/~kootsoop/images/SINE2.jpg
So your answers both appear to be correct regarding the peak location.
What I'm not getting is the large DC component. Are you sure you are generating a sine wave as the input? Does the input go negative? For a sinewave, the DC should be close to zero provided you get enough cycles.
Another avenue is to craft a Goertzel's Algorithm of each note center frequency you are looking for.
Once you get one implementation of the algorithm working you can make it such that it takes parameters to set it's center frequency. With that you could easily run 88 of them or what ever you need in a collection and scan for the peak value.
The Goertzel Algorithm is basically a single bin FFT. Using this method you can place your bins logarithmically as musical notes naturally go.
Some pseudo code from Wikipedia:
s_prev = 0
s_prev2 = 0
coeff = 2*cos(2*PI*normalized_frequency);
for each sample, x[n],
s = x[n] + coeff*s_prev - s_prev2;
s_prev2 = s_prev;
s_prev = s;
end
power = s_prev2*s_prev2 + s_prev*s_prev - coeff*s_prev2*s_prev;
The two variables representing the previous two samples are maintained for the next iteration. This can be then used in a streaming application. I thinks perhaps the power calculation should be inside the loop as well. (However it is not depicted as such in the Wiki article.)
In the tone detection case there would be 88 different coeficients, 88 pairs of previous samples and would result in 88 power output samples indicating the relative level in that frequency bin.
WaveyDavey says that he's capturing sound from a mic, thru the audio hardware of his computer, BUT that his results are not zero-centered. This sounds like a problem with the hardware. It SHOULD BE zero-centered.
When the room is quiet, the stream of values coming from the sound API should be very close to 0 amplitude, with slight +- variations for ambient noise. If a vibratory sound is present in the room (e.g. a piano, a flute, a voice) the data stream should show a fundamentally sinusoidal-based wave that goes both positive and negative, and averages near zero. If this is not the case, the system has some funk going on!
-Rick