How do sum a repeating sequences of numbers? - sql

I'm collecting sensory data using two Raspberry Pi's for redundancy and sending the data to a server. The sensory data broadcasts distance (mile 0.1, 0.0, 0.4 ...). Eventually the count gets high and it starts back at zero. (99.8, 99.9, 0, 0.1 ...).
a) How do a run a SQL command to get to total distance travelled? (Where the count resets is variable. Could be Mile 10)
b) How do I get a more accurate sum using both sets of data? (One set might have 0, 10, 78 and the other 1, 12, 81. The total distance being 81-0 or 81.

Related

Simple Moving Average, but ignoring zeroes

How do I calculate simple moving average, but ignoring zeroes.
For example, I have a series of values e.g. 5.95, 2.5, 0, 0, 6.5, 0, 2.4, 0, 1.9 and so on. All values are positive (greater than zero). In this case, the average that I am looking for is (5.95+2.5+6.5+2.4+1.9)/5 = 3.85
I have 8 difference variables in a script. I am trying to calculate a moving average for them.
On a side note, while googling for this problem, I came across filters and transforms. Is there any such mathematical concept that can be applied to my problem.

How to aggregate edge-based simulation output to grid cells?

I have a network that has to be divided into a grid of square cells (200 X 200 m). each cell includes sub-segments of the edges.
I have generated the simulation data output and used sumolib to extract edge-based output. I have to calculate the average traffic volume in each cell (not edge) measured in (vehicles/second).
this is part of the script I have written:
extract edge-based density and speed values:
for interval in sumolib.output.parse('cairo.edgeDataOutput.xml','interval'):
for edge in interval.edge:
edgeDataOutput[edge.id]= (edge.density,edge.speed)
after saving density and speed into edgeDataOutput, I have to aggregate into cells and calculate avg.traffic volume in each cell:
for cellID in ids:
density=0
speed=0
n=0 #avg.traffic vol
for edgeID in cell_edgeMap[cellID].keys():
if edgeID in edgeDataOutput.keys():
density+= float(edgeDataOutput[edgeID][1])
speed+= float(edgeDataOutput[edgeID][2])
n += (float(edgeDataOutput[edgeID][1])/1000) * float(edgeDataOutput[edgeID][2]) #traffic vol = (density/1000)*speed
densities.append(int((density / len(cell_edgeMap[cellID].keys()))+0.5))
speeds.append(int((speed / len(cell_edgeMap[cellID].keys()))+0.5))
numOfVehicles.append(int(n/len(cell_edgeMap[cellID].keys())))
as you can see from the code, I sum up the density, speed values of each edge that is in the cell then divide by the number of edges inside the cell to get the mean value.
density at cell(veh/Km) = sum(density at each edge inside cell)/num of edges inside cell.
speed at cell(m/s) = sum(speed at each edge inside cell)/num of edges in cell.
and I am using the following formula to calculate the traffic volume at each cell:
avg.traffic volume at cell(veh/s) = sum(avg.traffic volume at each edge inside cell)/num of edges inside cell.
avg.traffic volume at edge(veh/s) = density at edge(veh/Km) * speed at edge(m/s) / 1000.
I just want to make sure that I am using the write formula.
It is not easy to answer whether you do the right thing because averaging over both time and space is always hard. Furthermore it is not clear what you are trying to measure. The traffic volume usually denotes the total (or average) number of vehicles and is measured without unit (so only number of vehicles). The traffic flow is measured in vehicles per time unit but is usually applied only to a cross section not to an area. If you want the average number of cars in the cell it should suffice to divide the sum of the number of sampleSeconds by the length of the interval. The second value needs a more in depth discussion but I would probably at least multiply it with the edge length when summing up.

Computing the approximate LCM of a set of numbers

I'm writing a tone generator program for a microcontroller.
I use an hardware timer to trigger an interrupt and check if I need to set the signal to high or low in a particular moment for a given note.
I'm using pretty limited hardware, so the slower I run the timer the more time I have to do other stuff (serial communication, loading the next notes to generate, etc.).
I need to find the frequency at which I should run the timer to have an optimal result, which is, generate a frequency that is accurate enough and still have time to compute the other stuff.
To achieve this, I need to find an approximate (within some percent value, as the higher are the frequencies the more they need to be imprecise in value for a human ear to notice the error) LCM of all the frequencies I need to play: this value will be the frequency at which to run the hardware timer.
Is there a simple enough algorithm to compute such number? (EDIT, I shall clarify "simple enough": fast enough to run in a time t << 1 sec. for less than 50 values on a 8 bit AVR microcontroller and implementable in a few dozens of lines at worst.)
LCM(a,b,c) = LCM(LCM(a,b),c)
Thus you can compute LCMs in a loop, bringing in frequencies one at a time.
Furthermore,
LCM(a,b) = a*b/GCD(a,b)
and GCDs are easily computed without any factoring by using the Euclidean algorithm.
To make this an algorithm for approximate LCMs, do something like round lower frequencies to multiples of 10 Hz and higher frequencies to multiples of 50 Hz. Another idea that is a bit more principled would be to first convert the frequency to an octave (I think that the formula is f maps to log(f/16)/log(2)) This will give you a number between 0 and 10 (or slightly higher --but anything above 10 is almost beyond human hearing so you could perhaps round down). You could break 0-10 into say 50 intervals 0.0, 0.2, 0.4, ... and for each number compute ahead of time the frequency corresponding to that octave (which would be f = 16*2^o where o is the octave). For each of these -- go through by hand once and for all and find a nearby round number that has a number of smallish prime factors. For example, if o = 5.4 then f = 675.58 -- round to 675; if o = 5.8 then f = 891.44 -- round to 890. Assemble these 50 numbers into a sorted array, using binary search to replace each of your frequencies by the closest frequency in the array.
An idea:
project the frequency range to a smaller interval
Let's say your frequency range is from 20 to 20000 and you aim for a 2% accurary, you'll calculate for a 1-50 range. It has to be a non-linear transformation to keep the accurary for lower frequencies. The goal is both to compute the result faster and to have a smaller LCM.
Use a prime factors table to easily compute the LCM on that reduced range
Store the pre-calculated prime factors powers in an array (size about 50x7 for range 1-50), and then use it for the LCM: the LCM of a number is the product of multiplying the highest power of each prime factor of the number together. It's easy to code and blazingly fast to run.
Do the first step in reverse to get the final number.

clFFT performance evaluation

I have been working on performance evaluation of the clFFT library AMD Radeon R7 260x. The CPU is intel xeon inside and OS is centOS.
I have been studying the performance of 2D 16x16 clFFT with different batch modes (Parallel FFTs). I wondered to see the different results obtained from especially event profiling and gettimeofday.
The results of 2D 16x16 clFFT with different batch modes are as following,
Using EventProfiling:
batch kernel exec time(us)
1 320.7
16 461.1
256 458.3
512 537.7
1024 1016.8
Here, the batch represents the parallel FFTs and the kernel execution time represents the execution time in micro seconds.
Using gettimeofday
batch HtoD(us) kernelExecTime(us) DtoH(us)
1 29653 10850 39227
16 28313 10786 32474
256 26995 11167 39672
512 26145 10773 32273
1024 26856 11948 31060
Here, the batch represents the parallel FFTs, H to D represents data transfer time from host to device, the kernel exec time represents the kernel execution time and D to H represents the data transfer time from device to host and all are in micro seconds.
(I am sorry as I cant show you the results in good table format, I can not able to add tables here. hope you can read still).
Here are my questions,
1a) Why the kernel times obtained from EventProfiling are completely different from that of gettimeofday?
1b) Here the another question is that, which results are correct?
2) The data (w.r.t size) transfers increases as the batch size increases. Bur from the results of the gettimeofday, the data transfer times either the H to D or D to H are almost constant instead of growing as the batch size increases from 1 to 1024. Why is that?
clFinish( cl_queue);
// Copy data from host to device
gettimeofday(&t_s_gpu1, NULL);
clEnqueueWriteBuffer( cl_queue, d_data, CL_TRUE, 0, width*height*batchSize*sizeof(cl_compl_flt), h_src, 0, NULL, &event1);
clFinish( cl_queue);
clWaitForEvents(1, &event1);
gettimeofday(&t_e_gpu1, NULL);
checkCL( clAmdFftBakePlan( fftPlan, 1, &cl_queue, NULL, NULL) );
clAmdFftSetPlanBatchSize( fftPlan, batchSize );
clFinish( cl_queue);
gettimeofday(&t_s_gpu, NULL);
checkCL( clAmdFftEnqueueTransform( fftPlan, CLFFT_FORWARD, 1, &cl_queue, 0, NULL, &event, &d_data, NULL, NULL) );
clFinish( cl_queue);
clWaitForEvents(1, &event);
gettimeofday(&t_e_gpu, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL);
totaltime=totaltime+time_end - time_start;
clFinish( cl_queue);
// Copy result from device to host
gettimeofday(&t_s_gpu2, NULL);
checkCL( clEnqueueReadBuffer(cl_queue, d_data, CL_TRUE, 0, width*height*batchSize*sizeof(cl_compl_flt), h_res, 0, NULL, &event2));
clFinish( cl_queue);
clWaitForEvents(1, &event2);
gettimeofday(&t_e_gpu2, NULL);
I will be looking for your comments and answers and load of thanks in advance.

VB FFT - stuck understanding relationship of results to frequency

Trying to understand an fft (Fast Fourier Transform) routine I'm using (stealing)(recycling)
Input is an array of 512 data points which are a sample waveform.
Test data is generated into this array. fft transforms this array into frequency domain.
Trying to understand relationship between freq, period, sample rate and position in fft array. I'll illustrate with examples:
========================================
Sample rate is 1000 samples/s.
Generate a set of samples at 10Hz.
Input array has peak values at arr(28), arr(128), arr(228) ...
period = 100 sample points
peak value in fft array is at index 6 (excluding a huge value at 0)
========================================
Sample rate is 8000 samples/s
Generate set of samples at 440Hz
Input array peak values include arr(7), arr(25), arr(43), arr(61) ...
period = 18 sample points
peak value in fft array is at index 29 (excluding a huge value at 0)
========================================
How do I relate the index of the peak in the fft array to frequency ?
If you ignore the imaginary part, the frequency distribution is linear across bins:
Frequency#i = (Sampling rate/2)*(i/Nbins).
So for your first example, assumming you had 256 bins, the largest bin corresponds to a frequency of 1000/2 * 6/256 = 11.7 Hz.
Since your input was 10Hz, I'd guess that bin 5 (9.7Hz) also had a big component.
To get better accuracy, you need to take more samples, to get smaller bins.
Your second example gives 8000/2*29/256 = 453Hz. Again, close, but you need more bins.
Your resolution here is only 4000/256 = 15.6Hz.
It would be helpful if you were to provide your sample dataset.
My guess would be that you have what are called sampling artifacts. The strong signal at DC ( frequency 0 ) suggests that this is the case.
You should always ensure that the average value in your input data is zero - find the average and subtract it from each sample point before invoking the fft is good practice.
Along the same lines, you have to be careful about the sampling window artifact. It is important that the first and last data point are close to zero because otherwise the "step" from outside to inside the sampling window has the effect of injecting a whole lot of energy at different frequencies.
The bottom line is that doing an fft analysis requires more care than simply recycling a fft routine found somewhere.
Here are the first 100 sample points of a 10Hz signal as described in the question, massaged to avoid sampling artifacts
> sinx[1:100]
[1] 0.000000e+00 6.279052e-02 1.253332e-01 1.873813e-01 2.486899e-01 3.090170e-01 3.681246e-01 4.257793e-01 4.817537e-01 5.358268e-01
[11] 5.877853e-01 6.374240e-01 6.845471e-01 7.289686e-01 7.705132e-01 8.090170e-01 8.443279e-01 8.763067e-01 9.048271e-01 9.297765e-01
[21] 9.510565e-01 9.685832e-01 9.822873e-01 9.921147e-01 9.980267e-01 1.000000e+00 9.980267e-01 9.921147e-01 9.822873e-01 9.685832e-01
[31] 9.510565e-01 9.297765e-01 9.048271e-01 8.763067e-01 8.443279e-01 8.090170e-01 7.705132e-01 7.289686e-01 6.845471e-01 6.374240e-01
[41] 5.877853e-01 5.358268e-01 4.817537e-01 4.257793e-01 3.681246e-01 3.090170e-01 2.486899e-01 1.873813e-01 1.253332e-01 6.279052e-02
[51] -2.542075e-15 -6.279052e-02 -1.253332e-01 -1.873813e-01 -2.486899e-01 -3.090170e-01 -3.681246e-01 -4.257793e-01 -4.817537e-01 -5.358268e-01
[61] -5.877853e-01 -6.374240e-01 -6.845471e-01 -7.289686e-01 -7.705132e-01 -8.090170e-01 -8.443279e-01 -8.763067e-01 -9.048271e-01 -9.297765e-01
[71] -9.510565e-01 -9.685832e-01 -9.822873e-01 -9.921147e-01 -9.980267e-01 -1.000000e+00 -9.980267e-01 -9.921147e-01 -9.822873e-01 -9.685832e-01
[81] -9.510565e-01 -9.297765e-01 -9.048271e-01 -8.763067e-01 -8.443279e-01 -8.090170e-01 -7.705132e-01 -7.289686e-01 -6.845471e-01 -6.374240e-01
[91] -5.877853e-01 -5.358268e-01 -4.817537e-01 -4.257793e-01 -3.681246e-01 -3.090170e-01 -2.486899e-01 -1.873813e-01 -1.253332e-01 -6.279052e-02
And here is the resulting absolute values of the fft frequency domain
[1] 7.160038e-13 1.008741e-01 2.080408e-01 3.291725e-01 4.753899e-01 6.653660e-01 9.352601e-01 1.368212e+00 2.211653e+00 4.691243e+00 5.001674e+02
[12] 5.293086e+00 2.742218e+00 1.891330e+00 1.462830e+00 1.203175e+00 1.028079e+00 9.014559e-01 8.052577e-01 7.294489e-01
I'm a little rusty too on math and signal processing but with the additional info I can give it a shot.
If you want to know the signal energy per bin you need the magnitude of the complex output. So just looking at the real output is not enough. Even when the input is only real numbers. For every bin the magnitude of the output is sqrt(real^2 + imag^2), just like pythagoras :-)
bins 0 to 449 are positive frequencies from 0 Hz to 500 Hz. bins 500 to 1000 are negative frequencies and should be the same as the positive for a real signal. If you process one buffer every second frequencies and array indices line up nicely. So the peak at index 6 corresponds with 6Hz so that's a bit strange. This might be because you're only looking at the real output data and the real and imaginary data combine to give an expected peak at index 10. The frequencies should map linearly to the bins.
The peaks at 0 indicates a DC offset.
It's been some time since I've done FFT's but here's what I remember
FFT usually takes complex numbers as input and output. So I'm not really sure how the real and imaginary part of the input and output map to the arrays.
I don't really understand what you're doing. In the first example you say you process sample buffers at 10Hz for a sample rate of 1000 Hz? So you should have 10 buffers per second with 100 samples each. I don't get how your input array can be at least 228 samples long.
Usually the first half of the output buffer are frequency bins from 0 frequency (=dc offset) to 1/2 sample rate. and the 2nd half are negative frequencies. if your input is only real data with 0 for the imaginary signal positive and negative frequencies are the same. The relationship of real/imaginary signal on the output contains phase information from your input signal.
The frequency for bin i is i * (samplerate / n), where n is the number of samples in the FFT's input window.
If you're handling audio, since pitch is proportional to log of frequency, the pitch resolution of the bins increases as the frequency does -- it's hard to resolve low frequency signals accurately. To do so you need to use larger FFT windows, which reduces time resolution. There is a tradeoff of frequency against time resolution for a given sample rate.
You mention a bin with a large value at 0 -- this is the bin with frequency 0, i.e. the DC component. If this is large, then presumably your values are generally positive. Bin n/2 (in your case 256) is the Nyquist frequency, half the sample rate, which is the highest frequency that can be resolved in the sampled signal at this rate.
If the signal is real, then bins n/2+1 to n-1 will contain the complex conjugates of bins n/2-1 to 1, respectively. The DC value only appears once.
The samples are, as others have said, equally spaced in the frequency domain (not logarithmic).
For example 1, you should get this:
alt text http://home.comcast.net/~kootsoop/images/SINE1.jpg
For the other example you should get
alt text http://home.comcast.net/~kootsoop/images/SINE2.jpg
So your answers both appear to be correct regarding the peak location.
What I'm not getting is the large DC component. Are you sure you are generating a sine wave as the input? Does the input go negative? For a sinewave, the DC should be close to zero provided you get enough cycles.
Another avenue is to craft a Goertzel's Algorithm of each note center frequency you are looking for.
Once you get one implementation of the algorithm working you can make it such that it takes parameters to set it's center frequency. With that you could easily run 88 of them or what ever you need in a collection and scan for the peak value.
The Goertzel Algorithm is basically a single bin FFT. Using this method you can place your bins logarithmically as musical notes naturally go.
Some pseudo code from Wikipedia:
s_prev = 0
s_prev2 = 0
coeff = 2*cos(2*PI*normalized_frequency);
for each sample, x[n],
s = x[n] + coeff*s_prev - s_prev2;
s_prev2 = s_prev;
s_prev = s;
end
power = s_prev2*s_prev2 + s_prev*s_prev - coeff*s_prev2*s_prev;
The two variables representing the previous two samples are maintained for the next iteration. This can be then used in a streaming application. I thinks perhaps the power calculation should be inside the loop as well. (However it is not depicted as such in the Wiki article.)
In the tone detection case there would be 88 different coeficients, 88 pairs of previous samples and would result in 88 power output samples indicating the relative level in that frequency bin.
WaveyDavey says that he's capturing sound from a mic, thru the audio hardware of his computer, BUT that his results are not zero-centered. This sounds like a problem with the hardware. It SHOULD BE zero-centered.
When the room is quiet, the stream of values coming from the sound API should be very close to 0 amplitude, with slight +- variations for ambient noise. If a vibratory sound is present in the room (e.g. a piano, a flute, a voice) the data stream should show a fundamentally sinusoidal-based wave that goes both positive and negative, and averages near zero. If this is not the case, the system has some funk going on!
-Rick