How can I encode four unsigned bytes (0-255) to a float and back again using HLSL? - optimization

I am facing a task where one of my hlsl shaders require multiple texture lookups per pixel. My 2d textures are fixed to 256*256, so two bytes should be sufficient to address any given texel given this constraint. My idea is then to put two xy-coordinates in each float, giving me eight xy-coordinates in pixel space when packed in a Vector4 format image. These eight coordinates are then used to sample another texture(s).
The reason for doing this is to save graphics memory and an attempt to optimize processing time, since then I don't require multiple texture lookups.
By the way: Does anyone know if encoding/decoding 16 bytes from/to 4 floats using 1 sampling is slower than 4 samplings with unencoded data?
Edit: This is for Shader Model 3

If you are targeting SM 4.0-SM 5.0 you can use Binary Casts and Bitwise operations:
uint4 myUInt4 = asuint(myFloat4);
uint x0 = myUInt4.x & 0x000000FF; //extract two xy-coordinates (x0,y0), (x1,y1)
uint y0 = (myUInt4.x & 0x0000FF00) >> 8;
uint x1 = (myUInt4.x & 0x00FF0000) >> 16;
uint y1 = (myUInt4.x & 0xFF000000) >> 24;
//repeat operation for .y .z and .w coordinates
For previous Shader Models, I think it could be more complicated since it depends on FP precision.

Related

STM32 Gyroscope angle tracking

I'm working with a Gyroscope (L3GD20) with a 2000DPS
Correct me if their is a mistake,
I start by reading the values High and Low for the 3 axes and concatenate them. Then I multiply every value by 0.07 to convert them into DPS.
My main goal is to track the angle over time, so I simply implemented a Timer which reads the data every dt = 10 ms
to integrate ValueInDPS * 10ms, here is the code line I'm using :
angleX += (resultGyroX)*dt*0.001; //0.001 to get dt in [seconds]
This should give us the value of the angle in [degree] am I right ?
The problem is that the values I'm getting are a little bit weird, for example when I make a rotation of 90°, I get something like 70°...
Your method is a recipe for imprecision and accumulated error.
You should avoid using floating point (especially if there is no FPU), and especially also if this code is in the timer interrupt handler.
you should avoid unnecessarily converting to degrees/sec on every sample - that conversion is needed only for presentation, so you should perform it only when you need to need the value - internally the integrator should work in gyro sample units.
Additionally, if you are doing floating point in both an ISR and in a normal thread and you have an FPU, you may also encounter unrelated errors, because FPU registers are not preserved and restored in an interrupt handler. All in all floating point should only be used advisedly.
So let us assume you have a function gyroIntegrate() called precisely every 10ms:
static int32_t ax = 0
static int32_t ay = 0
static int32_t az = 0
void gyroIntegrate( int32_t sample_x, int32_t sample_y, int32_t sample_z)
{
ax += samplex ;
ay += sampley ;
az += samplez ;
}
Not ax etc. are the integration of the raw sample values and so proportional to the angle relative to the starting position.
To convert ax to degrees:
degrees = ax × r-1 × s
Where:
r is the gyro resolution in degrees per second (0.07)
s is the sample rate (100).
Now you would do well to avoid floating point and here it is entirely unnecessary; r-1 x s is a constant (1428.571 in this case). So to read the current angle represented by the integrator, you might have a function:
#define GYRO_SIGMA_TO_DEGREESx10 14286
void getAngleXYZ( int32_t* int32_t, int32_t* ydeg, int32_t* zdeg )
{
*xdeg = (ax * 10) / GYRO_SIGMA_TO_DEGREESx10 ;
*ydeg = (ax * 10) / GYRO_SIGMA_TO_DEGREESx10 ;
*zdeg = (ax * 10) / GYRO_SIGMA_TO_DEGREESx10 ;
}
getAngleXYZ() should be called from the application layer when you need a result - not from the integrator - you do the math at the point of need and have CPU cycles left to do more useful stuff.
Note that in the above I have ignored the possibility of arithmetic overflow of the integrator. As it is it is good for approximately +/-1.5 million degrees +/-4175 rotations), so it may not be a problem in some applications. You could use an int64_t or if you are not interested in the number of rotations, just the absolute angle then, in the integrator:
ax += samplex ;
ax %= GYRO_SIGMA_360 ;
Where GYRO_SIGMA_360 equals 514286 (360 x s / r).
Unfortunately, MEMs sensor math is quite complicated.
I would personally use ready libraries provided by the STM https://www.st.com/en/embedded-software/x-cube-mems1.html.
I actually use them, and the results are very good.

Pass audio spectrum to a shader as texture in libGDX

I'm developing an audio visualizer using libGDX.
I want to pass the audio spectrum data (an array containing the FFT of the audio sample) to a shader I took from Shadertoy: https://www.shadertoy.com/view/ttfGzH.
In the GLSL code I expect an uniform containing the data as texture:
uniform sampler2D iChannel0;
The problem is that I can't figure out how to pass an arbitrary array as a texture to a shader in libGDX.
I already searched in SO and in libGDX's forum but there isn't a satisfying answer to my problem.
Here is my Kotlin code (that obviously doesn't work xD):
val p = Pixmap(512, 1, Pixmap.Format.Alpha)
val t = Texture(p)
val map = p.pixels
map.putFloat(....) // fill the map with FFT data
[...]
t.bind(0)
shader.setUniformi("iChannel0", 0)
You could simply use the drawPixel method and store your data in the first channel of each pixel just like in the shadertoy example (they use the red channel).
float[] fftData = // your data
Color tmpColor = new Color();
Pixmap pixmap = new Pixmap(fftData.length, 1, Pixmap.Format.RGBA8888);
for(int i = 0; i < fftData.length i++)
{
tmpColor.set(fftData[i], 0, 0, 0); // using only 1 channel per pixel
pixmap.drawPixel(i, 0, Color.rgba8888(tmpColor));
}
// then create your texture and bind it to the shader
To be more efficient and require 4x less memory (and possibly less samples depending on the shader), you could use 4 channels per pixels by splitting your data accross the r, g, b and a channels. However, this will complexify the shader a bit.
This data being passed in the shader example you provided is not arbitrary though, it has pretty limited precision and ranges between 0 and 1. If you want to increase precision you may want to store the floating point accross multiple channels (although the IEEE recomposition in the shader may be painful) or passing an integer to be scaled down (fixed point). If you need data between -inf and inf you may use sigmoid and anti sigmoig functions, at the cost of highly reducing the precision again. I believe this technique will work for your example though, as they seem to only require values between 0 and 1 and precision is not super important because the result is smoothed.

Strange bitwise operation with Bitmap row width, what does it mean? (And why)

What and why is the developer adding a hex value of 16, then using the bitwise operations AND followed by a NOT in this line:
size_t bytesPerRow = ((width * 4) + 0x0000000F) & ~0x0000000F;
He comments that "16 byte aligned is good", what does he mean?
- (CGContextRef)createBitmapContext {
CGRect boundingBox = CGPathGetBoundingBox(_mShape);
size_t width = CGRectGetWidth(boundingBox);
size_t height = CGRectGetHeight(boundingBox);
size_t bitsPerComponent = 8;
size_t bytesPerRow = ((width * 4) + 0x0000000F) & ~0x0000000F; // 16 byte aligned is good
ANDing with ~0x0000000F = 0xFFFFFFF0 (aka -16) rounds down to a multiple of 16, simply by resetting those bits that could make it anything other than a multiple of 16 (the 8's, 4's, 2's and 1's).
Adding 15 (0x0000000F) first makes it round up instead of down.
purpose of size_t bytesPerRow = ((width * 4) + 0x0000000F) & ~0x0000000F; is to round up this value to 16 bytes
The goal is to set bytesPerRow to be the smallest multiple of 16 that is capable of holding a row of data. This is done so that a bitmap can be allocated where every row address is 16 byte aligned, i.e. a multiple of 16. There are many possible benefits to alignment, including optimizations that take advantage of it. Some APIs may also require alignment.
The code sets the 4 least significant bits to zero. If the value is an address it will be on an even 16 byte boundary, "16 byte aligned".
this ia a one's complement so
~0x0000000F
becomes
0xFFFFFFF0
and-ing it with another value will clear 4 least significant bits.
This is the kind of thing we used to do all the time "back in the day"!
He's adding 0xf, and then masking out the lower 4 bits (& ~0xf), to make sure the value is rounded up. If he didn't add the 0xf, it would round down.

How can DWT be used in LSB substitution steganography

In steganography, the least significant bit (LSB) substitution method embeds the secret bits in the place of bits from the cover medium, for example, image pixels. In some methods, the Discrete Wavelet Transform (DWT) of the image is taken and the secret bits are embedded in the DWT coefficients, after which the inverse trasform is used to reconstruct the stego image.
However, the DWT produces float coefficients and for the LSB substitution method integer values are required. Most papers I've read use the 2D Haar Wavelet, yet, they aren't clear on their methodology. I've seen the transform being defined in terms of low and high pass filters (float transforms), or taking the sum and difference of pair values, or the average and mean difference, etc.
More explicitly, either in the forward or the inverse transform (but not necessarily in both depending on the formulas used) eventually float numbers will appear. I can't have them for the coefficients because the substitution won't work and I can't have them for the reconstructed pixels because the image requires integer values for storage.
For example, let's consider a pair of pixels, A and B as a 1D array. The low frequency coefficient is defined by the sum, i.e., s = A + B, and the high frequency coefficient by the difference, i.e., d = A - B. We can then reconstruct the original pixels with B = (s - d) / 2 and A = s - B. However, after any bit twiddling with the coefficients, s - d may not be even anymore and float values will emerge for the reconstructed pixels.
For the 2D case, the 1D transform is applied separately for the rows and the columns, so eventually a division by 4 will occur somewhere. This can result in values with float remainders .00, .25, .50 and .75. I've only come across one paper which addresses this issue. The rest are very vague in their methodology and I struggle to replicate them. Yet, the DWT has been widely implemented for image steganography.
My question is, since some of the literature I've read hasn't been enlightening, how can this be possible? How can one use a transform which introduces float values, yet the whole steganography method requires integers?
One solution that has worked for me is using the Integer Wavelet Transform, which some also refer to as a lifting scheme. For the Haar wavelet, I've seen it defined as:
s = floor((A + B) / 2)
d = A - B
And for inverse:
A = s + floor((d + 1) / 2)
B = s - floor(d / 2)
All the values throughout the whole process are integers. The reason it works is because the formulas contain information about both the even and odd parts of the pixels/coefficients, so there is no loss of information from rounding down. Even if one modifies the coefficients and then takes the inverse transform, the reconstructed pixels will still be integers.
Example implementation in Python:
import numpy as np
def _iwt(array):
output = np.zeros_like(array)
nx, ny = array.shape
x = nx // 2
for j in xrange(ny):
output[0:x,j] = (array[0::2,j] + array[1::2,j])//2
output[x:nx,j] = array[0::2,j] - array[1::2,j]
return output
def _iiwt(array):
output = np.zeros_like(array)
nx, ny = array.shape
x = nx // 2
for j in xrange(ny):
output[0::2,j] = array[0:x,j] + (array[x:nx,j] + 1)//2
output[1::2,j] = output[0::2,j] - array[x:nx,j]
return output
def iwt2(array):
return _iwt(_iwt(array.astype(int)).T).T
def iiwt2(array):
return _iiwt(_iiwt(array.astype(int).T).T)
Some languages already have built-in functions for this purpose. For example, Matlab uses lwt2() and ilwt2() for 2D lifting-scheme wavelet transform.
els = {'p',[-0.125 0.125],0};
lshaarInt = liftwave('haar','int2int');
lsnewInt = addlift(lshaarInt,els);
[cAint,cHint,cVint,cDint] = lwt2(x,lsnewInt) % x is your image
xRecInt = ilwt2(cAint,cHint,cVint,cDint,lsnewInt);
An article example where IWT was used for image steganography is Raja, K.B. et. al (2008) Robust image adaptive steganography using integer wavelets.

Bit-shifting audio samples from Float32 to SInt16 results in severe clipping

I'm new to the iOS and its C underpinnings, but not to programming in general. My dilemma is this. I'm implementing an echo effect in a complex AudioUnits based application. The application needs reverb, echo, and compression, among other things. However, the echo only works right when I use a particular AudioStreamBasicDescription format for the audio samples generated in my app. This format however doesn't work with the other AudioUnits.
While there are other ways to solve this problem fixing the bit-twiddling in the echo algorithm might be the most straight forward approach.
The*AudioStreamBasicDescription* that works with echo has a mFormatFlag of: kAudioFormatFlagsAudioUnitCanonical; It's specifics are:
AudioUnit Stream Format (ECHO works, NO AUDIO UNITS)
Sample Rate: 44100
Format ID: lpcm
Format Flags: 3116 = kAudioFormatFlagsAudioUnitCanonical
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
Set ASBD on input
Set ASBD on output
au SampleRate rate: 0.000000, 2 channels, 12 formatflags, 1819304813 mFormatID, 16 bits per channel
The stream format that works with AudioUnits is the same except for the mFormatFlag: kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved -- Its specifics are:
AudioUnit Stream Format (NO ECHO, AUDIO UNITS WORK)
Sample Rate: 44100
Format ID: lpcm
Format Flags: 41
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
Set ASBD on input
Set ASBD on output
au SampleRate rate: 44100.000000, 2 channels, 41 formatflags, 1819304813 mFormatID, 32 bits per channel
In order to create the echo effect I use two functions that bit-shift sample data into SInt16 space, and back. As I said, this works for the kAudioFormatFlagsAudioUnitCanonical, format but not the other. When it fails, the sounds are clipped and distorted, but they are there. I think this indicates that the difference between these two formats is how the data is arranged in the Float32.
// convert sample vector from fixed point 8.24 to SInt16
void fixedPointToSInt16( SInt32 * source, SInt16 * target, int length ) {
int i;
for(i = 0;i < length; i++ ) {
target[i] = (SInt16) (source[i] >> 9);
//target[i] *= 0.003;
}
}
*As you can see I tried modifying the amplitude of the samples to get rid of the clipping -- clearly that didn't work.
// convert sample vector from SInt16 to fixed point 8.24
void SInt16ToFixedPoint( SInt16 * source, SInt32 * target, int length ) {
int i;
for(i = 0;i < length; i++ ) {
target[i] = (SInt32) (source[i] << 9);
if(source[i] < 0) {
target[i] |= 0xFF000000;
}
else {
target[i] &= 0x00FFFFFF;
}
}
}
If I can determine the difference between kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved, then I can modify the above methods accordingly. But I'm not sure how to figure that out. Documentation in CoreAudio is enigmatic, but from what I've read there, and gleaned from the CoreAudioTypes.h file, both mFormatFlag(s) refer to the same Fixed Point 8.24 format. Clearly something is different, but I can't figure out what.
Thanks for reading through this long question, and thanks in advance for any insight you can provide.
kAudioFormatFlagIsFloat means that the buffer contains floating point values. If mBitsPerChannel is 32 then you are dealing with float data (also called Float32), and if it is 64 you are dealing with double data.
kAudioFormatFlagsNativeEndian refers to the fact that the data in the buffer matches the endianness of the processor, so you don't have to worry about byte swapping.
kAudioFormatFlagIsPacked means that every bit in the data is significant. For example, if you store 24 bit audio data in 32 bits, this flag will not be set.
kAudioFormatFlagIsNonInterleaved means that each individual buffer consists of one channel of data. It is common for audio data to be interleaved, with the samples alternating between L and R channels: LRLRLRLR. For DSP applications it is often easier to deinterleave the data and work on one channel at a time.
I think in your case the error is that you are treating floating point data as fixed point. Float data is generally scaled to the interval [-1, +1). To convert float to SInt16 you need to multiply each sample by the maximum 16-bit value (1u << 15, 32768) and then clip to the interval [-32768, 32767].