This:
gdallocationinfo -valonly -wgs84 file longitude latitude
provides the value from the file at the resolved pixel.
Is there a gdal function that can provide an interpolated value from the neighbouring pixels?
For example these calls read elevation from Greenwich Park in London:
gdallocationinfo -wgs84 srtm_36_02.tif 0 51.4779
47
gdallocationinfo -wgs84 srtm_36_02.tif 0 51.4780
37
That's a 10 metre drop in elevation for a movement of 0.0001°, about 11 metres north.
The pixels in the file are quite coarse - corresponding to about 80 metres on the ground. I want to get smoother values out rather than sudden big jumps.
The workaround I'm currently using is to resample the source file at four times the resolution using this transformation:
gdalwarp -ts 24004 24004 -r cubicspline srtm_36_02.tif srtm_36_02_cubicspline_x4.tiff
The elevations requests for the same locations as previously using the new file:
gdallocationinfo -wgs84 srtm_36_02_cubicspline_x4.tiff 0 51.4779
43
gdallocationinfo -wgs84 srtm_36_02_cubicspline_x4.tiff 0 51.4780
41
which is much better as that is only a 2 metre jump.
The downside of this approach is that it takes a few minutes to generate the higher resolution file, but the main problem is that the file size goes from 69MB to 1.1GB.
I'm surprised that resampling is not a direct option to gdallocationinfo, or maybe there is another approach I can use?
You can write a Python or a Node.js script to do this, it would be 4 or 5 lines of code as GDAL's RasterIO can resample on the fly.
Node.js would go like this:
const cellSize = 4; // This is your resampling factor
const gdal = require('gdal-async');
const ds = gdal.open('srtm_36_02.tif');
// Transform from WGS84 to raster coordinates
const xform = new gdal.CoordinateTransformation(
gdal.SpatialReference.fromEPSG(4326), ds);
const coords = xform.transformPoint({x, y});
ds.bands.get(1).pixels.read(
coords.x - cellSize/2,
coords.y - cellSize/2,
cellSize,
cellSize,
undefined, // Let GDAL allocate an output buffer
{ buffer_width: 1, buffer_height: 1 } // of this size
);
console.log(data);
For brevity I have omitted the clamping of the coordinates when you are near the edges, you have to reduce the size in this case.
(disclaimer: I am the author of the Node.js bindings for GDAL)
You may try to get a 1-pixel raster from gdalwarp. This would use all the warp resample machinery with minimal impact on ram/cpu/disk. I am using this (inside a Python program, since the calculations may be a bit too complex for a shell script). It does work.
Related
I'm working with a Gyroscope (L3GD20) with a 2000DPS
Correct me if their is a mistake,
I start by reading the values High and Low for the 3 axes and concatenate them. Then I multiply every value by 0.07 to convert them into DPS.
My main goal is to track the angle over time, so I simply implemented a Timer which reads the data every dt = 10 ms
to integrate ValueInDPS * 10ms, here is the code line I'm using :
angleX += (resultGyroX)*dt*0.001; //0.001 to get dt in [seconds]
This should give us the value of the angle in [degree] am I right ?
The problem is that the values I'm getting are a little bit weird, for example when I make a rotation of 90°, I get something like 70°...
Your method is a recipe for imprecision and accumulated error.
You should avoid using floating point (especially if there is no FPU), and especially also if this code is in the timer interrupt handler.
you should avoid unnecessarily converting to degrees/sec on every sample - that conversion is needed only for presentation, so you should perform it only when you need to need the value - internally the integrator should work in gyro sample units.
Additionally, if you are doing floating point in both an ISR and in a normal thread and you have an FPU, you may also encounter unrelated errors, because FPU registers are not preserved and restored in an interrupt handler. All in all floating point should only be used advisedly.
So let us assume you have a function gyroIntegrate() called precisely every 10ms:
static int32_t ax = 0
static int32_t ay = 0
static int32_t az = 0
void gyroIntegrate( int32_t sample_x, int32_t sample_y, int32_t sample_z)
{
ax += samplex ;
ay += sampley ;
az += samplez ;
}
Not ax etc. are the integration of the raw sample values and so proportional to the angle relative to the starting position.
To convert ax to degrees:
degrees = ax × r-1 × s
Where:
r is the gyro resolution in degrees per second (0.07)
s is the sample rate (100).
Now you would do well to avoid floating point and here it is entirely unnecessary; r-1 x s is a constant (1428.571 in this case). So to read the current angle represented by the integrator, you might have a function:
#define GYRO_SIGMA_TO_DEGREESx10 14286
void getAngleXYZ( int32_t* int32_t, int32_t* ydeg, int32_t* zdeg )
{
*xdeg = (ax * 10) / GYRO_SIGMA_TO_DEGREESx10 ;
*ydeg = (ax * 10) / GYRO_SIGMA_TO_DEGREESx10 ;
*zdeg = (ax * 10) / GYRO_SIGMA_TO_DEGREESx10 ;
}
getAngleXYZ() should be called from the application layer when you need a result - not from the integrator - you do the math at the point of need and have CPU cycles left to do more useful stuff.
Note that in the above I have ignored the possibility of arithmetic overflow of the integrator. As it is it is good for approximately +/-1.5 million degrees +/-4175 rotations), so it may not be a problem in some applications. You could use an int64_t or if you are not interested in the number of rotations, just the absolute angle then, in the integrator:
ax += samplex ;
ax %= GYRO_SIGMA_360 ;
Where GYRO_SIGMA_360 equals 514286 (360 x s / r).
Unfortunately, MEMs sensor math is quite complicated.
I would personally use ready libraries provided by the STM https://www.st.com/en/embedded-software/x-cube-mems1.html.
I actually use them, and the results are very good.
I have the following code taken from this tutorial.
def get_spectrogram(waveform):
zero_padding = tf.zeros([4900] - tf.shape(waveform), dtype=tf.float32)
waveform = tf.cast(waveform, tf.float32)
equal_length = tf.concat([waveform, zero_padding], 0)
spectrogram = tf.signal.stft(equal_length, frame_length=256, frame_step=128)
spectrogram = tf.abs(spectrogram)
return spectrogram
spectrogram = get_spectrogram(waveform)
print('Spectrogram shape:', spectrogram.shape)
And i have the following output of spectrogram shape.
Spectrogram shape: (37, 129)
What does the first and second value mean?
If I have 4900 samples and a frame_step of 128. Shouldn't the first value be 38?
4900/128 = 38.28125 -> 38 rounded
It also happens that with a Kotlin library I get a shape of (38, 127).
I need to understand, since I am implementing a model in Android with TFLite, therefore I am pre-processing the data from the mobile device.
I'm not familiar exactly with Python API, but assuming it's doing similar to WaveBeans which I'm very familiar with, it looks like what you've got is the 2-dimensional matrix.
What you're doing is a Short Fourier Transform, which is basically taking FFT over time. Whilst the FFT magnitude or phase is 2-dimensional and can be represented as a 1-dimensional vector, the SFT is 3-dimensional and have also the time axes, which is why it is 2-dimensional vector.
So it looks like the 38 side is time indexes, the 127 side is frequency index, the values are the FFT value on specific time-frequency bin, though that are complex numbers. Thinking of it as a polar coordinates, the phase is the angle, the magnitude is the length. In your code seems you're getting the magnitude by calling .abs() function, so you've already got rid of complex number representation.
Within WaveBeans there is an API to work with FFT specifically to extract out the phase and magnitude, as well as frequency values, and time values.
So to just keep the answer full I'll provide a code snippet:
// let's take simple sine as an example
val waveformAsAStream = 440.sine().trim(1000)
val fftStream = waveformAsAStream
.window(256,128)
// zero padding is already done inside, but if window.size == fft.size it doesn't really do anything
.fft(256)
// evaluate it, for example as a kotlin sequence
val stft = fftStream.asSequence(44100.0f)
.toList()
// get the specific sample for the sake of the demonstration
val fftSample = stft.drop(10).first()
// get time in nano seconds
fftSample.time()
// outputs the time of the taken sample:
// 29024943
// get frequencies values
fftSample.frequency().toList()
// outputs a list of size 128, each element is a frequency in Hz :
// [0.0, 172.265625, 344.53125, 516.796875, 689.0625, ..., 21360.9375, 21533.203125, 21705.46875, 21877.734375]
// get magnitude values
fftSample.magnitude().toList()
// outputs a list of size 128, each element is magnitude value for specific bin in dB:
// [29.629418039768613, 31.125367384785786, 38.077554502661705, 38.480916556622745, ..., -11.57802246867041]
// the index of the closest bin (index) of the frequency
fftSample.bin(440.0)
// outputs:
// 3
// get the magnitude in the FFT spectrogram of the specific frequency
fftSample.magnitude().toList()[fftSample.bin(440.0)]
// outputs:
// 38.480916556622745
Although I would recommend for better FFT output result to use window functions for example hamming is the popular one, and use less sized windows (zero padding will do the aligning trick in that case as FFT requires specific input length), i.e something like this:
waveformAsAStream
.window(101, 85)
.hamming()
.fft(256)
If you want to play around with the values you may use Kotlin Jupyter notebook with WaveBeans library, check it out on github
I am using naudio with SineWaveProvider32 code directly from http://mark-dot-net.blogspot.com/2009/10/playback-of-sine-wave-in-naudio.html to generate
sine wave tones. The relevant code in the SineWaveProvider32 class:
public override int Read(float[] buffer, int offset, int sampleCount)
{
int sampleRate = WaveFormat.SampleRate;
for (int n = 0; n < sampleCount; n++)
{
buffer[n + offset] =
(float)(Amplitude * Math.Sin((2 * Math.PI * sample * Frequency) / sampleRate));
sample++;
if (sample >= sampleRate) sample = 0;
}
return sampleCount;
}
I was getting clicks/beats every second, so I changed
if (sample >= sampleRate) sample = 0;
to
if (sample >= (int)(sampleRate / Frequency)) sample = 0;
This fixed the clicks every second (so that "sample" was always relative to a zero-crossing, not the sample rate).
However, whenever I set the Amplitude variable, I get a click. I tried setting it only when the buffer[] was at a zero-crossing,
thinking that a sudden jump in amplitude might be causing the problem. That did not solve the problem. I am setting the Amplitude to values between
0.25 and 0.0
I tried adusting the latency and number of buffers as suggested in NAudio change volume in runtime but that
had no effect either.
My code that changes the Amplitude:
public async void play(int durationMS, float amplitude = .25f)
{
PitchPlayer pPlayer = new PitchPlayer(this.frequency, amplitude);
pPlayer.play();
await Task.Delay(durationMS/2);
pPlayer.provider.Amplitude = .15f;
await Task.Delay(durationMS /2);
pPlayer.stop();
}
the clicks are caused by a discontinuity in the waveform. This is hard to fix in a class like this because ideally you would slowly ramp the volume from one value to the other. This can be done by modifying the code to have a target amplitude, and then if the current amplitude is not equal to the target amplitude then you move towards it by a small delta amount calculated each time through the loop. So over a period of say 10ms, you move from the old to new amplitude. But you'd need to write this yourself unfortunately.
For a similar concept where the frequency is being changed gradually rather than the amplitude, take a look at my blog post on portamento in NAudio.
Angular speed
Instead of frequency it is easier to think in terms of angular speed. How much to increase the angular argument of a sin() function for each sample.
When using radians for angle, one periode completing a full circle is 2*pi so the angular velocity of one Hz is (2*pi)/T = (2*pi)/1/f = f*2*pi = 1*2*pi [rad/s]
The sample rate is in [samples per second] and the angular velocity is in [radians per second] so to get the [angle per sample] you simply divide angular velocity by sample rate to get [radians/second]/[samples/second] = [radians/sample].
That is the number to continuously increase the angle of the sin() function for each sample - no multiplication is needed.
To sweep from one frequency to another you simply move from one angular increment to another in small steps over a number of samples.
By sweeping between frequencies there will be a continuous chain of adjacent samples and transient spread out smoothly over time.
Moving from one amplitude to another could also be spread out over multiple samples to avoid sharp transients.
Fade in and fade out incrementally adjusting the amplitude at the start and end of a sound is more graceful than stepping the output from one level to another in one sample.
Sharp steps produce rings on the water that propagate out in the world.
About sin() calculations
For speedy calculations it may be better to rotate a vector of the length of the amplitude and calculate sn=sin(delta), cs=cos(delta) only when angular speed changes:
Wikipedia Link to theory
where amplitude^2 = x^2 + y^2, each new sample can be calculated as:
px = x * cs - y * sn;
py = x * sn + y * cs;
To increase the amplitude you simply multiply px and py by a factor say 1.01. To make the next sample you set x=px, y=py and run the px, py calculation again with cs and sn the same all the time.
py or px can be used as the signal output and will be 90 deg out of phase.
On the first sample you can set x=amplitude and y=0.
I'm currently using python 3.3 in combination with pyaudio and numpy. I took the example from the pyaudio website to play a simple wave file and send that data onto the default sound card.
Now I would like to change the volume of the audio, but when I multiply the array by 0.5, I get a lot of noise and distortion.
Here is a code sample:
while data != '':
decodeddata = numpy.fromstring(data, numpy.int16)
newdata = (decodeddata * 0.5).astype(numpy.int16)
stream.write(newdata.tostring())
data = wf.readframes(CHUNK)
How should I handle multiplication or division on this array without ruining the waveform?
Thanks,
It seemed that the source file's bitrate (24 bit) was not compatible with portaudio. After exporting to a 16 bit pcm file, the multiplication did not result in distortion.
To fix this for different typed files, it is necessary to check the bit depth and rescale correspondingly.
I've started editing the RaspiStillYUV.c code. I eventually want to process the image I receive, but for now, I'm just working to understand it. Why am I working with YUV instead of RGB? So I can learn something new. I've made minor changes to the function camera_buffer_callback. All I am doing is the following:
fprintf(stderr, "GREAT SUCCESS! %d\n", buffer->length);
The line this is replacing:
bytes_written = fwrite(buffer->data, 1, buffer->length, pData->file_handle);
Now, the dimensions should be 2592 x 1944 (w x h) as set in the code. Working off of Wikipedia (YUV420) I have come to the conclusion that the file size should be w * h * 1.5. Since the Y component has 1 byte of data for each pixel and the U and V components have 1 byte of data for every 4 pixels (1 + 1/4 + 1/4 = 1.5). Great. Doing the math in Python:
>>> 2592 * 1944 * 1.5
7558272.0
Unfortunately, this does not line up with the output of my program:
GREAT SUCCESS! 7589376
That leaves a difference of 31104 bytes.
I figure that the buffer is allocated in fixed size chunks (the output size is evenly divisible by 512). While I would like to understand that mystery, I'm fine with the fixed size chunk explanation.
My question is if I am missing something. Are the extra bytes beyond the expected size meaningful in this format? Should they be ignored? Are my calculations off?
The documentation at this location supports your theory on padding: http://www.raspberrypi.org/wp-content/uploads/2013/07/RaspiCam-Documentation.pdf
Specifically:
Note that the image buffers saved in raspistillyuv are padded to a
horizontal size divisible by 16 (so there may be unused bytes at the
end of each line to made the width divisible by 16). Buffers are also
padded vertically to be divisible by 16, and in the YUV mode, each
plane of Y,U,V is padded in this way.
So my interpretation of this is the following.
The width is 2592 (divisible by 16 so this is ok).
The height is 1944 which is 8 short of being divisible by 16 so an extra 8*2592 are added (also multiplied by 1.5) thus giving your 31104 extra bytes.
Although this kindof helps with the size of the file, it doesn't explain the structure of the YUV output properly. I am having a look at this description to see if this provides a hint to start with: http://en.wikipedia.org/wiki/YUV#Y.27UV420p_.28and_Y.27V12_or_YV12.29_to_RGB888_conversion
From this I believe it is as follows:
Y Channel:
2592 * (1944+8) = 5059584
U Channel:
1296 * (972+4) = 1264896
V Channel:
1296 * (972+4) = 1264896
Giving a sum of :
5059584 + 2*1264896 = 7589376
This makes the numbers add up so only thing left is to confirm if this interpretation is correct.
I am also trying to do the YUV decode (for image comparisons) so if you can confirm if this actually does correspond to what you are reading in the YUV file this would be much appreciated.
You have to read the manual carefully. Buffers are padded to multiples of 16, but colour data is half-size, so your image size needs to be in multiples of 32 to avoid problems with padding breaking external software.