Bit-shifting audio samples from Float32 to SInt16 results in severe clipping - objective-c

I'm new to the iOS and its C underpinnings, but not to programming in general. My dilemma is this. I'm implementing an echo effect in a complex AudioUnits based application. The application needs reverb, echo, and compression, among other things. However, the echo only works right when I use a particular AudioStreamBasicDescription format for the audio samples generated in my app. This format however doesn't work with the other AudioUnits.
While there are other ways to solve this problem fixing the bit-twiddling in the echo algorithm might be the most straight forward approach.
The*AudioStreamBasicDescription* that works with echo has a mFormatFlag of: kAudioFormatFlagsAudioUnitCanonical; It's specifics are:
AudioUnit Stream Format (ECHO works, NO AUDIO UNITS)
Sample Rate: 44100
Format ID: lpcm
Format Flags: 3116 = kAudioFormatFlagsAudioUnitCanonical
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
Set ASBD on input
Set ASBD on output
au SampleRate rate: 0.000000, 2 channels, 12 formatflags, 1819304813 mFormatID, 16 bits per channel
The stream format that works with AudioUnits is the same except for the mFormatFlag: kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved -- Its specifics are:
AudioUnit Stream Format (NO ECHO, AUDIO UNITS WORK)
Sample Rate: 44100
Format ID: lpcm
Format Flags: 41
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
Set ASBD on input
Set ASBD on output
au SampleRate rate: 44100.000000, 2 channels, 41 formatflags, 1819304813 mFormatID, 32 bits per channel
In order to create the echo effect I use two functions that bit-shift sample data into SInt16 space, and back. As I said, this works for the kAudioFormatFlagsAudioUnitCanonical, format but not the other. When it fails, the sounds are clipped and distorted, but they are there. I think this indicates that the difference between these two formats is how the data is arranged in the Float32.
// convert sample vector from fixed point 8.24 to SInt16
void fixedPointToSInt16( SInt32 * source, SInt16 * target, int length ) {
int i;
for(i = 0;i < length; i++ ) {
target[i] = (SInt16) (source[i] >> 9);
//target[i] *= 0.003;
}
}
*As you can see I tried modifying the amplitude of the samples to get rid of the clipping -- clearly that didn't work.
// convert sample vector from SInt16 to fixed point 8.24
void SInt16ToFixedPoint( SInt16 * source, SInt32 * target, int length ) {
int i;
for(i = 0;i < length; i++ ) {
target[i] = (SInt32) (source[i] << 9);
if(source[i] < 0) {
target[i] |= 0xFF000000;
}
else {
target[i] &= 0x00FFFFFF;
}
}
}
If I can determine the difference between kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved, then I can modify the above methods accordingly. But I'm not sure how to figure that out. Documentation in CoreAudio is enigmatic, but from what I've read there, and gleaned from the CoreAudioTypes.h file, both mFormatFlag(s) refer to the same Fixed Point 8.24 format. Clearly something is different, but I can't figure out what.
Thanks for reading through this long question, and thanks in advance for any insight you can provide.

kAudioFormatFlagIsFloat means that the buffer contains floating point values. If mBitsPerChannel is 32 then you are dealing with float data (also called Float32), and if it is 64 you are dealing with double data.
kAudioFormatFlagsNativeEndian refers to the fact that the data in the buffer matches the endianness of the processor, so you don't have to worry about byte swapping.
kAudioFormatFlagIsPacked means that every bit in the data is significant. For example, if you store 24 bit audio data in 32 bits, this flag will not be set.
kAudioFormatFlagIsNonInterleaved means that each individual buffer consists of one channel of data. It is common for audio data to be interleaved, with the samples alternating between L and R channels: LRLRLRLR. For DSP applications it is often easier to deinterleave the data and work on one channel at a time.
I think in your case the error is that you are treating floating point data as fixed point. Float data is generally scaled to the interval [-1, +1). To convert float to SInt16 you need to multiply each sample by the maximum 16-bit value (1u << 15, 32768) and then clip to the interval [-32768, 32767].

Related

STM32 Gyroscope angle tracking

I'm working with a Gyroscope (L3GD20) with a 2000DPS
Correct me if their is a mistake,
I start by reading the values High and Low for the 3 axes and concatenate them. Then I multiply every value by 0.07 to convert them into DPS.
My main goal is to track the angle over time, so I simply implemented a Timer which reads the data every dt = 10 ms
to integrate ValueInDPS * 10ms, here is the code line I'm using :
angleX += (resultGyroX)*dt*0.001; //0.001 to get dt in [seconds]
This should give us the value of the angle in [degree] am I right ?
The problem is that the values I'm getting are a little bit weird, for example when I make a rotation of 90°, I get something like 70°...
Your method is a recipe for imprecision and accumulated error.
You should avoid using floating point (especially if there is no FPU), and especially also if this code is in the timer interrupt handler.
you should avoid unnecessarily converting to degrees/sec on every sample - that conversion is needed only for presentation, so you should perform it only when you need to need the value - internally the integrator should work in gyro sample units.
Additionally, if you are doing floating point in both an ISR and in a normal thread and you have an FPU, you may also encounter unrelated errors, because FPU registers are not preserved and restored in an interrupt handler. All in all floating point should only be used advisedly.
So let us assume you have a function gyroIntegrate() called precisely every 10ms:
static int32_t ax = 0
static int32_t ay = 0
static int32_t az = 0
void gyroIntegrate( int32_t sample_x, int32_t sample_y, int32_t sample_z)
{
ax += samplex ;
ay += sampley ;
az += samplez ;
}
Not ax etc. are the integration of the raw sample values and so proportional to the angle relative to the starting position.
To convert ax to degrees:
degrees = ax × r-1 × s
Where:
r is the gyro resolution in degrees per second (0.07)
s is the sample rate (100).
Now you would do well to avoid floating point and here it is entirely unnecessary; r-1 x s is a constant (1428.571 in this case). So to read the current angle represented by the integrator, you might have a function:
#define GYRO_SIGMA_TO_DEGREESx10 14286
void getAngleXYZ( int32_t* int32_t, int32_t* ydeg, int32_t* zdeg )
{
*xdeg = (ax * 10) / GYRO_SIGMA_TO_DEGREESx10 ;
*ydeg = (ax * 10) / GYRO_SIGMA_TO_DEGREESx10 ;
*zdeg = (ax * 10) / GYRO_SIGMA_TO_DEGREESx10 ;
}
getAngleXYZ() should be called from the application layer when you need a result - not from the integrator - you do the math at the point of need and have CPU cycles left to do more useful stuff.
Note that in the above I have ignored the possibility of arithmetic overflow of the integrator. As it is it is good for approximately +/-1.5 million degrees +/-4175 rotations), so it may not be a problem in some applications. You could use an int64_t or if you are not interested in the number of rotations, just the absolute angle then, in the integrator:
ax += samplex ;
ax %= GYRO_SIGMA_360 ;
Where GYRO_SIGMA_360 equals 514286 (360 x s / r).
Unfortunately, MEMs sensor math is quite complicated.
I would personally use ready libraries provided by the STM https://www.st.com/en/embedded-software/x-cube-mems1.html.
I actually use them, and the results are very good.

Pass audio spectrum to a shader as texture in libGDX

I'm developing an audio visualizer using libGDX.
I want to pass the audio spectrum data (an array containing the FFT of the audio sample) to a shader I took from Shadertoy: https://www.shadertoy.com/view/ttfGzH.
In the GLSL code I expect an uniform containing the data as texture:
uniform sampler2D iChannel0;
The problem is that I can't figure out how to pass an arbitrary array as a texture to a shader in libGDX.
I already searched in SO and in libGDX's forum but there isn't a satisfying answer to my problem.
Here is my Kotlin code (that obviously doesn't work xD):
val p = Pixmap(512, 1, Pixmap.Format.Alpha)
val t = Texture(p)
val map = p.pixels
map.putFloat(....) // fill the map with FFT data
[...]
t.bind(0)
shader.setUniformi("iChannel0", 0)
You could simply use the drawPixel method and store your data in the first channel of each pixel just like in the shadertoy example (they use the red channel).
float[] fftData = // your data
Color tmpColor = new Color();
Pixmap pixmap = new Pixmap(fftData.length, 1, Pixmap.Format.RGBA8888);
for(int i = 0; i < fftData.length i++)
{
tmpColor.set(fftData[i], 0, 0, 0); // using only 1 channel per pixel
pixmap.drawPixel(i, 0, Color.rgba8888(tmpColor));
}
// then create your texture and bind it to the shader
To be more efficient and require 4x less memory (and possibly less samples depending on the shader), you could use 4 channels per pixels by splitting your data accross the r, g, b and a channels. However, this will complexify the shader a bit.
This data being passed in the shader example you provided is not arbitrary though, it has pretty limited precision and ranges between 0 and 1. If you want to increase precision you may want to store the floating point accross multiple channels (although the IEEE recomposition in the shader may be painful) or passing an integer to be scaled down (fixed point). If you need data between -inf and inf you may use sigmoid and anti sigmoig functions, at the cost of highly reducing the precision again. I believe this technique will work for your example though, as they seem to only require values between 0 and 1 and precision is not super important because the result is smoothed.

Extra bytes on the end of YUV buffer - RaspberryPi

I've started editing the RaspiStillYUV.c code. I eventually want to process the image I receive, but for now, I'm just working to understand it. Why am I working with YUV instead of RGB? So I can learn something new. I've made minor changes to the function camera_buffer_callback. All I am doing is the following:
fprintf(stderr, "GREAT SUCCESS! %d\n", buffer->length);
The line this is replacing:
bytes_written = fwrite(buffer->data, 1, buffer->length, pData->file_handle);
Now, the dimensions should be 2592 x 1944 (w x h) as set in the code. Working off of Wikipedia (YUV420) I have come to the conclusion that the file size should be w * h * 1.5. Since the Y component has 1 byte of data for each pixel and the U and V components have 1 byte of data for every 4 pixels (1 + 1/4 + 1/4 = 1.5). Great. Doing the math in Python:
>>> 2592 * 1944 * 1.5
7558272.0
Unfortunately, this does not line up with the output of my program:
GREAT SUCCESS! 7589376
That leaves a difference of 31104 bytes.
I figure that the buffer is allocated in fixed size chunks (the output size is evenly divisible by 512). While I would like to understand that mystery, I'm fine with the fixed size chunk explanation.
My question is if I am missing something. Are the extra bytes beyond the expected size meaningful in this format? Should they be ignored? Are my calculations off?
The documentation at this location supports your theory on padding: http://www.raspberrypi.org/wp-content/uploads/2013/07/RaspiCam-Documentation.pdf
Specifically:
Note that the image buffers saved in raspistillyuv are padded to a
horizontal size divisible by 16 (so there may be unused bytes at the
end of each line to made the width divisible by 16). Buffers are also
padded vertically to be divisible by 16, and in the YUV mode, each
plane of Y,U,V is padded in this way.
So my interpretation of this is the following.
The width is 2592 (divisible by 16 so this is ok).
The height is 1944 which is 8 short of being divisible by 16 so an extra 8*2592 are added (also multiplied by 1.5) thus giving your 31104 extra bytes.
Although this kindof helps with the size of the file, it doesn't explain the structure of the YUV output properly. I am having a look at this description to see if this provides a hint to start with: http://en.wikipedia.org/wiki/YUV#Y.27UV420p_.28and_Y.27V12_or_YV12.29_to_RGB888_conversion
From this I believe it is as follows:
Y Channel:
2592 * (1944+8) = 5059584
U Channel:
1296 * (972+4) = 1264896
V Channel:
1296 * (972+4) = 1264896
Giving a sum of :
5059584 + 2*1264896 = 7589376
This makes the numbers add up so only thing left is to confirm if this interpretation is correct.
I am also trying to do the YUV decode (for image comparisons) so if you can confirm if this actually does correspond to what you are reading in the YUV file this would be much appreciated.
You have to read the manual carefully. Buffers are padded to multiples of 16, but colour data is half-size, so your image size needs to be in multiples of 32 to avoid problems with padding breaking external software.

How to convert GPS Longitude and latitude from hex

I am trying to convert GPS data from GPS tracking device. The company provided protocol manual, but it's not clear. Most of the data I was able to decoding from the packets I received from the device. The communication is over TCP/IP. I am having a problem decoding the hex value of the longitude and latitude. Here is an example from the manual:
Example: 22º32.7658’=(22X60+32.7658)X3000=40582974, then converted into a hexadecimal number
40582974(Decimal)= 26B3F3E(Hexadecimal)
at last the value is 0x02 0x6B 0x3F 0x3E.
I would like to know how to reverse from the hexadecimal to longitude and latitude. The device will send 26B3F3E. I want to know the process of getting 22º32.7658.
This protocol applies to GT06 and Heacent 908.
Store all four values in unsigned 32-bit variables.
v1 = 0x02, v2 = 0x6b, v3 = 0x3f, v4 = 0x3e.
Compute (v4 << 48) | (v3 << 32) | (v2 << 16) | v1 this will yield a variable holding the value 40582974 decimal.
Convert this to a float and divide it by 30,000.0 (your 3,000 was an error), this will give you 1,352.765
Chop to integer and divide by 60. This will give you the 22.
Multiply the number you got in step 4 by 60 and subtract it from the number you got in step 3. This will give you 1352.765 - 22*60 or 32.765.
There's your answer 22, 32.765.

How can I encode four unsigned bytes (0-255) to a float and back again using HLSL?

I am facing a task where one of my hlsl shaders require multiple texture lookups per pixel. My 2d textures are fixed to 256*256, so two bytes should be sufficient to address any given texel given this constraint. My idea is then to put two xy-coordinates in each float, giving me eight xy-coordinates in pixel space when packed in a Vector4 format image. These eight coordinates are then used to sample another texture(s).
The reason for doing this is to save graphics memory and an attempt to optimize processing time, since then I don't require multiple texture lookups.
By the way: Does anyone know if encoding/decoding 16 bytes from/to 4 floats using 1 sampling is slower than 4 samplings with unencoded data?
Edit: This is for Shader Model 3
If you are targeting SM 4.0-SM 5.0 you can use Binary Casts and Bitwise operations:
uint4 myUInt4 = asuint(myFloat4);
uint x0 = myUInt4.x & 0x000000FF; //extract two xy-coordinates (x0,y0), (x1,y1)
uint y0 = (myUInt4.x & 0x0000FF00) >> 8;
uint x1 = (myUInt4.x & 0x00FF0000) >> 16;
uint y1 = (myUInt4.x & 0xFF000000) >> 24;
//repeat operation for .y .z and .w coordinates
For previous Shader Models, I think it could be more complicated since it depends on FP precision.