How to manually or automatically optimize HLSL (pixel) shader code? - optimization

What are successful strategies to optimize HLSL shader code in terms of computational complexity (meaning: minimizing runtime of the shader)?
I guess one way would be to minimize the number of arithmetic operations that result from compiling the shader.
How could this be done a) manually and b) using automated tools (if existing) ?
Collection of manual techniques (Updated)
Avoid branching (But how to do that best?)
Whenever possible: precompute outside shader and pass as argument.
An example code would be:
float2 DisplacementScroll;
// Parameter that limit the water effect
float glowHeight;
float limitTop;
float limitTopWater;
float limitLeft;
float limitRight;
float limitBottom;
sampler TextureSampler : register(s0); // Original color
sampler DisplacementSampler : register(s1); // Displacement
float fadeoutWidth = 0.05;
// External rumble displacement
int enableRumble;
float displacementX;
float displacementY;
float screenZoom;
float4 main(float4 color : COLOR0, float2 texCoord : TEXCOORD0) : COLOR0
{
// Calculate minimal distance to next border
float dx = min(texCoord.x - limitLeft, limitRight - texCoord.x);
float dy = min(texCoord.y - limitTop, limitBottom - texCoord.y);
///////////////////////////////////////////////////////////////////////////////////////
// RUMBLE //////////////////////
///////////////////////////////////////////////////////////////////////////////////////
if (enableRumble!=0)
{
// Limit rumble strength by distance to HLSL-active region (think map)
// The factor of 100 is chosen by hand and controls slope with which dimfactor goes to 1
float dimfactor = clamp(100.0f * min(dx, dy), 0, 1); // Maximum is 1.0 (do not amplify)
// Shift texture coordinates by rumble
texCoord.x += displacementX * dimfactor * screenZoom;
texCoord.y += displacementY * dimfactor * screenZoom;
}
//////////////////////////////////////////////////////////////////////////////////////////
// Water refraction (optical distortion) and water like-color tint //////////////////////
//////////////////////////////////////////////////////////////////////////////////////////
if (dx >= 0)
{
float dyWater = min(texCoord.y - limitTopWater, limitBottom - texCoord.y);
if (dyWater >= 0)
{
// Look up the amount of displacement from texture
float2 displacement = tex2D(DisplacementSampler, DisplacementScroll + texCoord / 3);
float finalFactor = min(dx,dyWater) / fadeoutWidth;
if (finalFactor > 1) finalFactor = 1;
// Apply displacement by water refraction
texCoord.x += (displacement.x * 0.2 - 0.15) * finalFactor * 0.15 * screenZoom; // Why these strange numbers ?
texCoord.y += (displacement.y * 0.2 - 0.15) * finalFactor * 0.15 * screenZoom;
// Look up the texture color of the original underwater pixel.
color = tex2D(TextureSampler, texCoord);
// Additional color transformation (blue shift)
color.r = color.r - 0.1f;
color.g = color.g - 0.1f;
color.b = color.b + 0.3f;
}
else if (dyWater > -glowHeight)
{
// No water distortion...
color = tex2D(TextureSampler, texCoord);
// Scales from 0 (upper glow limit) ... 1 (near water surface)
float glowFactor = 1 - (dyWater / -glowHeight);
// ... but bluish glow
// Additional color transformation
color.r = color.r - (glowFactor * 0.1); // 24 = 1/(30f/720f); // Prelim: depends on screen resolution, must fit to value in HLSL Update
color.g = color.g - (glowFactor * 0.1);
color.b = color.b + (glowFactor * 0.3);
}
else
{
// Return original color (no water distortion above and below)
color = tex2D(TextureSampler, texCoord);
}
}
else
{
// Return original color (no water distortion left or right)
color = tex2D(TextureSampler, texCoord);
}
return color;
}
technique Refraction
{
pass Pass0
{
PixelShader = compile ps_2_0 main();
}
}

I'm not very familar with the HLSL internals, but from what I've learned from GLSL is: never branch something. It probably will execute both parts and then decide which result of them should be valid.
Also have a look at this
and this.
As far as I know there are no automatic tools except the compiler itself. For very low level optimization you can use fxc with the /Fc parameter to get the assembly listing. The possible assembly instructions are listed here. One low level optimization which is worth mentioning is MAD: multiply and add. This may not be optimized to a MAD operation (I'm not sure, just try it out yourself):
a *= b;
a += c;
but this should be optimized to a MAD:
a = (a * b) + c;

You can optimize your code using mathematical techniques that involve manipulation functions, would be something like:
// Shift texture coordinates by rumble
texCoord.x += displacementX * dimfactor * screenZoom;
texCoord.y += displacementY * dimfactor * screenZoom;
Here you multiply three values​​, but only one of them comes from a register of the GPU, the other two are constants, you could pre multiply and store in a global constant.
// Shift texture coordinates by rumble
texCoord.x += dimfactor * pre_zoom_dispx; // displacementX * screenZoom
texCoord.y += dimfactor * pre_zoom_dispy; // displacementY * screenZoom
Another example:
// Apply displacement by water refraction
texCoord.x += (displacement.x * 0.2 - 0.15) * finalFactor * 0.15 * screenZoom; // Why these strange numbers ?
texCoord.y += (displacement.y * 0.2 - 0.15) * finalFactor * 0.15 * screenZoom;
0.15 * screenZoom <- can be optimized by one global.
The HLSL Compiler of Visual Studio 2012 have a option in poperties to enable Optimizations. But the best optimization that you can make is write the HLSL code simple as possible and using the Intrinsic functions http://msdn.microsoft.com/en-us/library/windows/desktop/ff471376(v=vs.85).aspx
Those functions are like memcpy of C, using assembly code in body that uses resources of system like 128-bit registers (yes, CPU have 128-bit registers http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) and strongly fast operations.

Related

Move camera target position along with the mouse

I am rewriting my application using the modern OpenGL (3.3+) in Jogl.
I am using all the conventional matrices, that is objectToWorld, WorldToCamera and CameraToClip (or model, view and projection)
I created a class for handling all the mouse movements as McKesson does in its "Learning modern 3d graphic programming" with a method to offset the camera target position:
private void offsetTargetPosition(MouseEvent mouseEvent){
Mat4 currMat = calcMatrix();
Quat orientation = currMat.toQuaternion();
Quat invOrientation = orientation.conjugate();
Vec2 current = new Vec2(mouseEvent.getX(), mouseEvent.getY());
Vec2 diff = current.minus(startDragMouseLoc);
Vec3 worldOffset = invOrientation.mult(new Vec3(-diff.x*10, diff.y*10, 0.0f));
currView.setTargetPos(currView.getTargetPos().plus(worldOffset));
startDragMouseLoc = current;
}
calcMatrix() returns the camera matrix, the rest should be clear.
What I want is moving my object along with the mouse, right now mouse movement and object translation don't correspond, that is they are not linear, because I am dealing with different spaces I guess..
I learnt that if I want to apply a transformation T in space O, but related in space C, I should do the following with p as vertex:
C * (C * T * C^-1) * O * p
Should I do something similar?
I solved with a damn simple proportion...
float x = (float) (10000 * 2 * EC_Main.viewer.getAspect() * diff.x / EC_Main.viewer.getWidth());
float y = (float) (10000 * 2 * diff.y / EC_Main.viewer.getHeight());
Vec3 worldOffset = invOrientation.mult(new Vec3(-x, y, 0.0f));
Taking into account my projection matrix
Mat4 orthographicMatrix = Jglm.orthographic(-10000.0f * (float) EC_Main.viewer.getAspect(), 10000.0f * (float) EC_Main.viewer.getAspect(),
-10000.0f, 10000.0f, -10000.0f, 10000.0f);

Goerzel algorithm- amplitude goes down,and other issues

I am using Goerzel to id a certain frequency .
What i see is that it works great-but in a strange way- when i input to it samples(±500/1024) i get the right values-but they becomes lower and lower till zero -while the frequency is STILL there . so i get for ex: 700, than it goes slowly down ..
Also, i would like to make it more exponential -so differences between noise and frequency will be higher .
What can cause this problem ,and how can i improve my code ?
thanks.
float goertzel_mag(int16_t* data ,int SAMPLING_RATE ,double TARGET_FREQUENCY,int numSamples )
{
int k,i;
float floatnumSamples;
float omega,sine,cosine,coeff,q0,q1,q2,magnitude,real,imag;
float scalingFactor = numSamples / 2.0; // -2
floatnumSamples = (float) numSamples;
k = (int) (0.5 + ((floatnumSamples * TARGET_FREQUENCY) / SAMPLING_RATE));
omega = (2.0 * M_PI * k) / floatnumSamples;
sine = sin(omega);
cosine = cos(omega);
coeff = 2.0 * cosine;
q0=0;
q1=0;
q2=0;
for(i=0; i<numSamples; i++)
{
q0 = coeff * q1 - q2 + data[i];
q2 = q1;
q1 = q0;
}
real = (q1 - q2 * cosine) / scalingFactor;
imag = (q2 * sine) / scalingFactor;
//double theta = atan2 ( imag, real); //PHASE
magnitude = sqrtf(real*real + imag*imag);
return magnitude;
}
After SO much researches about Goerzel , i found out that the problem is not him .
When i input a pure sin wave to the mac , and print out the buffer :
int16_t *q = (int16_t *)(&bufferList)->mBuffers[0].mData;
Its values are becomes high, but after 5 seconds- the signal is going lower and lower to zero!
Moving the signal source, will make it again becomes higher, and goes down again.
For what i have read , the chanel can go into saturation , and maybe this can cause the problem.
This Goerzel algorithm is very good .

Calculate ray direction vector from screen coordanate

I'm looking for a better way (or a note that this is the best way) to transfer a pixel coordinate to its corresponding ray direction from a arbitrary camera position/direction.
My current method is as follows. I define a "camera" as a position vector, lookat vector, and up vector, named as such. (Note that the lookat vector is a unit vector in the direction the camera is facing, NOT where (position - lookat) is the direction, as is the standard in XNA's Matrix.CreateLookAt) These three vectors can uniquely define a camera position. Here's the actual code (well, not really the actual, a simplified abstracted version) (Language is HLSL)
float xPixelCoordShifted = (xPixelCoord / screenWidth * 2 - 1) * aspectRatio;
float yPixelCoordShifted = yPixelCoord / screenHeight * 2 - 1;
float3 right = cross(lookat, up);
float3 actualUp = cross(right, lookat);
float3 rightShift = mul(right, xPixelCoordShifted);
float3 upShift = mul(actualUp, yPixelCoordShifted);
return normalize(lookat + rightShift + upShift);
(the return value is the direction of the ray)
So what I'm asking is this- What's a better way to do this, maybe using matrices, etc. The problem with this method is that if you have too wide a viewing angle, the edges of the screen get sort of "radially stretched".
You can calculate it (ray) in pixel shader, HLSL code:
float4x4 WorldViewProjMatrix; // World*View*Proj
float4x4 WorldViewProjMatrixInv; // (World*View*Proj)^(-1)
void VS( float4 vPos : POSITION,
out float4 oPos : POSITION,
out float4 pos : TEXCOORD0 )
{
oPos = mul(vPos, WorldViewProjMatrix);
pos = oPos;
}
float4 PS( float4 pos : TEXCOORD0 )
{
float4 posWS = mul(pos, WorldViewProjMatrixInv);
float3 ray = posWS.xyz / posWS.w;
return float4(0, 0, 0, 1);
}
The information about your camera's position and direction is in View matrix (Matrix.CreateLookAt).

Working out Oscillator wave type code, and creating new wave types

I want to work out a bit of code that generates the oscillator wave-type in my tone generator app. The one in this example is a sine-wave, can someone tell me how the code works, as i want to in the future make custom wave-types and square, sawtooth and triangle types.
OSStatus RenderTone(
void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
// Fixed amplitude is good enough for our purposes
const double amplitude = 0.25;
// Get the tone parameters out of the view controller
ToneGeneratorViewController *viewController =
(ToneGeneratorViewController *)inRefCon;
double theta = viewController->theta;
double theta_increment = 2.0 * M_PI * viewController->frequency / viewController->sampleRate;
// This is a mono tone generator so we only need the first buffer
const int channel = 0;
Float32 *buffer = (Float32 *)ioData->mBuffers[channel].mData;
// Generate the samples
for (UInt32 frame = 0; frame < inNumberFrames; frame++)
{
buffer[frame] = sin(theta) * amplitude;
theta += theta_increment;
if (theta > 2.0 * M_PI)
{
theta -= 2.0 * M_PI;
}
}
// Store the theta back in the view controller
viewController->theta = theta;
return noErr;
}
The actual sine wave samples are being generated and are populating the buffer in the snippet below
for (UInt32 frame = 0; frame < inNumberFrames; frame++)
{
buffer[frame] = sin(theta) * amplitude;
theta += theta_increment;
if (theta > 2.0 * M_PI)
{
theta -= 2.0 * M_PI;
}
}
In the line where buffer[frame] is being assigned, you are calling sin(theta) * amplitude, and for each iteration of the for loop, you are incrementing theta by some finite step size based on your frequency and sample rate, via
double theta_increment = 2.0 * M_PI * viewController->frequency / viewController->sampleRate;
Which is essentially dividing 2.0 * PI * frequency by your sample rate.
Incrementing the theta variable while looping through the for loop is basically advancing the time step one sample at a time until your buffer is full (i.e. frame == iNumberFrames).
If you wanted to generate something other than a sine wave, you would simply replace the following line with some other function:
buffer[frame] = sin(theta) * amplitude;
I.e. let's say, for example, you wanted the first three terms in the infinite Fourier series that converges to a triangle wave; you might then have the following instead...
buffer[frame] = (8 / pow(M_PI,2)) * (sin(theta) - sin(3*theta)/9 + sin(5*theta)/25);
To produce your desired waveform, you need to replace the sin() function with a function that produces your desired wave shape.
You might be able to find this function in a table of functions with graphical examples, or you might have to create your function. The are lots of ways to create a functional approximation, including polynomial, Fourier series, table lookup with or without interpolation, recursions, and etc. But that is a big subject on its own (many textbooks, etc.)

Map GPS Coordinates to an Image and draw some GPS Points on it

I have some problems figuring out where my error is. I got the following:
Have an image and corresponding GPS coordinates of its top-left and bottom-right vertices.
E.g:
topLeft.longitude = 8.235128;
topLeft.latitude = 49.632383;
bottomRight.longitude = 8.240547;
bottomRight.latitude = 49.629808;
Now a have an Point that lies in that map:
p.longitude = 8.238567;
p.latitude = 49.630664;
I draw my image in landscape fullscreen (1024*748).
Now I want to calculate the exact Pixel position (x,y) of my point.
For doing that I am trying to use the great circle distance approach from here: Link.
CGFloat DegreesToRadians(CGFloat degrees)
{
return degrees * M_PI / 180;
};
- (float) calculateDistanceP1:(CLLocationCoordinate2D)p1 andP2:(CLLocationCoordinate2D)p2 {
double circumference = 40000.0; // Erdumfang in km am Äquator
double distance = 0.0;
double latitude1Rad = DegreesToRadians(p1.latitude);
double longitude1Rad = DegreesToRadians(p1.longitude);
double latititude2Rad = DegreesToRadians(p2.latitude);
double longitude2Rad = DegreesToRadians(p2.longitude);
double logitudeDiff = fabs(longitude1Rad - longitude2Rad);
if (logitudeDiff > M_PI)
{
logitudeDiff = 2.0 * M_PI - logitudeDiff;
}
double angleCalculation =
acos(sin(latititude2Rad) * sin(latitude1Rad) + cos(latititude2Rad) * cos(latitude1Rad) * cos(logitudeDiff));
distance = circumference * angleCalculation / (2.0 * M_PI);
NSLog(#"%f",distance);
return distance;
}
Here is my code for getting the Pixel position:
- (CGPoint) calculatePoint:(CLLocationCoordinate2D)point {
float x_coord;
float y_coord;
CLLocationCoordinate2D x1;
CLLocationCoordinate2D x2;
x1.longitude = p.longitude;
x1.latitude = topLeft.latitude;
x2.longitude = p.longitude;
x2.latitude = bottomRight.latitude;
CLLocationCoordinate2D y1;
CLLocationCoordinate2D y2;
y1.longitude = topLeft.longitude;
y1.latitude = p.latitude;
y2.longitude = bottomRight.longitude;
y2.latitude = p.latitude;
float distanceX = [self calculateDistanceP1:x1 andP2:x2];
float distanceY = [self calculateDistanceP1:y1 andP2:y2];
float distancePX = [self calculateDistanceP1:x1 andP2:p];
float distancePY = [self calculateDistanceP1:y1 andP2:p];
x_coord = fabs(distancePX * (1024 / distanceX))-1;
y_coord = fabs(distancePY * (748 / distanceY))-1;
return CGPointMake(x_coord,y_coord);
}
x1 and x2 are the points on the longitude of p and with latitude of topLeft and bottomRight.
y1 and y2 are the points on the latitude of p and with longitude of topLeft and bottomRight.
So I got the distance between left and right on longitude of p and distance between top and bottom on latitude of p. (Needed for calculate the pixel position)
Now I calculate the distance between x1 and p (my distance between x_0 and x_p) after that I calculate the distance between y1 and p (distance between y_0 and y_p)
Last but not least the Pixel position is calculated and returned.
The Result is, that my point is on the red and NOT on the blue position:
Maybe you find any mistakes or have any suggestions for improving the accuracy.
Maybe I didn't understand your question, but shouldn't you be using the Converting Map Coordinates methods of MKMapView?
See this image
I used your co-ordinates, and simply did the following:
x_coord = 1024 * (p.longitude - topLeft.longitude)/(bottomRight.longitude - topLeft.longitude);
y_coord = 748 - (748 * (p.latitude - bottomRight.latitude)/(topLeft.latitude - bottomRight.latitude));
The red dot markes this point. For such small distances you don't really need to use great circles, and your rounding errors will be making things much more inaccurate