I'm developing an audio visualizer using libGDX.
I want to pass the audio spectrum data (an array containing the FFT of the audio sample) to a shader I took from Shadertoy: https://www.shadertoy.com/view/ttfGzH.
In the GLSL code I expect an uniform containing the data as texture:
uniform sampler2D iChannel0;
The problem is that I can't figure out how to pass an arbitrary array as a texture to a shader in libGDX.
I already searched in SO and in libGDX's forum but there isn't a satisfying answer to my problem.
Here is my Kotlin code (that obviously doesn't work xD):
val p = Pixmap(512, 1, Pixmap.Format.Alpha)
val t = Texture(p)
val map = p.pixels
map.putFloat(....) // fill the map with FFT data
[...]
t.bind(0)
shader.setUniformi("iChannel0", 0)
You could simply use the drawPixel method and store your data in the first channel of each pixel just like in the shadertoy example (they use the red channel).
float[] fftData = // your data
Color tmpColor = new Color();
Pixmap pixmap = new Pixmap(fftData.length, 1, Pixmap.Format.RGBA8888);
for(int i = 0; i < fftData.length i++)
{
tmpColor.set(fftData[i], 0, 0, 0); // using only 1 channel per pixel
pixmap.drawPixel(i, 0, Color.rgba8888(tmpColor));
}
// then create your texture and bind it to the shader
To be more efficient and require 4x less memory (and possibly less samples depending on the shader), you could use 4 channels per pixels by splitting your data accross the r, g, b and a channels. However, this will complexify the shader a bit.
This data being passed in the shader example you provided is not arbitrary though, it has pretty limited precision and ranges between 0 and 1. If you want to increase precision you may want to store the floating point accross multiple channels (although the IEEE recomposition in the shader may be painful) or passing an integer to be scaled down (fixed point). If you need data between -inf and inf you may use sigmoid and anti sigmoig functions, at the cost of highly reducing the precision again. I believe this technique will work for your example though, as they seem to only require values between 0 and 1 and precision is not super important because the result is smoothed.
Related
tl/dr: I've got two audio recordings of the same song without timestamps, and I'd like to align them. I believe FFT is the way to go, but while I've got a long way, it feels like I'm right on the edge of understanding enough to make it work, and would greatly benefit from a "you got this part wrong" advice on FFT. (My education never got into this area) So I came here seeking ELI5 help.
The journey:
Get two recordings at the same sample rate. (done!)
Transform them into a waveform. (DoubleArray) This doesn't keep any of the meta info like "samples/second" but the FFT math doesn't care until later.
Run a FFT on them using a simplified implementation for beginners
Get a Array<Frame>, each Frame contains Array<Bin>, each Bin has (amplitude, frequency) because the older implementation hid all the details (like frame width, and number of Bins, and ... stuff?) and outputs words I'm familiar with like "amplitude" and "frequency"
Try moving to a more robust FFT (Apache Commons)
Get an output of 'real' and 'imaginary' (uh oh)
Make the totally incorrect assumption that those were the same thing (amplitude and frequency). Surprise, they aren't!
Apache's FFT returns an Array<Complex> which means it... er... is just one frame's worth? And I should be chopping the song into 1 second chunks and passing each one into the FFT and call it multiple times? That seems strange, how does it get lower frequencies?
To the best of my understanding, the complex number is a way to convey the phase shift and amplitude in one neat container (and you need phase shift if you want to do the FFT in reverse). And the frequency is calculated from the index of the array.
Which works out to (pseudocode in Kotlin)
val audioFile = File("Dream_On.pcm")
val (phases, amplitudes) = AudioInputStream(
audioFile.inputStream(),
AudioFormat(
/* encoding = */ AudioFormat.Encoding.PCM_SIGNED,
/* sampleRate = */ 44100f,
/* sampleSizeInBits = */ 16,
/* channels = */ 2,
/* frameSize = */ 4,
/* frameRate = */ 44100f,
/* bigEndian = */ false
),
(audiFile.length() / /* frameSize */ 4)
).use { ais ->
val bytes = ais.readAllBytes()
val shorts = ShortArray(bytes.size / 2)
ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts)
val allWaveform = DoubleArray(shorts.size)
for (i in shorts.indices) {
allWaveform[i] = shorts[i].toDouble()
}
val halfwayThroughSong = allWaveform.size / 2
val moreThanOneSecond = allWaveform.copyOfRange(halfwayThroughSong, halfwayThroughSong + findNextPowerOf2(44100))
val fft = FastFourierTransformer(DftNormalization.STANDARD)
val fftResult: Array<Complex> = fft.transform(moreThanOneSecond, TransformType.FORWARD)
println("fftResult size: ${fftResult.size}")
val phases = DoubleArray(fftResult.size / 2)
val amplitudes = DoubleArray(fftResult.size / 2)
val frequencies = DoubleArray(fftResult.size / 2)
fftResult.filterIndexed { index, _ -> index < fftResult.size / 2 }.forEachIndexed { idx, complex ->
phases[idx] = atan2(complex.imaginary, complex.real)
frequencies[idx] = idx * 44100.0 / fftResult.size
amplitudes[idx] = hypot(complex.real, complex.imaginary)
}
Triple(phases, frequencies, amplitudes)
}
Is my step #8 at all close to the truth? Why would the FFT result return an array as big as my input number of samples? That makes me think I've got the "window" or "frame" part wrong.
I read up on
FFT real/imaginary/abs parts interpretation
Converting Real and Imaginary FFT output to Frequency and Amplitude
Java - Finding frequency and amplitude of audio signal using FFT
An audio recording in waveform is a series of sound energy levels, basically how much sound energy there should be at any one instant. Based on the bit rate, you can think of the whole recording as a graph of energy versus time.
Sound is made of waves, which have frequencies and amplitudes. Unless your recording is of a pure sine wave, it will have many different waves of sound coming and going, which summed together create the total sound that you experience over time. At any one instant of time, you have energy from many different waves added together. Some of those waves may be at their peaks, and some at their valleys, or anywhere in between.
An FFT is a way to convert energy-vs.-time data into amplitude-vs.-frequency data. The input to an FFT is a block of waveform. You can't just give it a single energy level from a one-dimensional point in time, because then there is no way to determine all the waves that add together to make up the amplitude at that point of time. So, you give it a series of amplitudes over some finite period of time.
The FFT then does its math and returns a range of complex numbers that represent the waves of sound over that chunk of time, that when added together would create the series of energy levels over that block of time. That's why the return value is an array. It represents a bunch of frequency ranges. Together the total data of the array represents the same energy from the input array.
You can calculate from the complex numbers both phase shift and amplitude for each frequency range represented in the return array.
Ultimately, I don’t see why performing an FFT would get you any closer to syncing your recordings. Admittedly it’s not a task I’ve tried before. But I would think waveform data is already the perfect form for comparing the data and finding matching patterns. If you break your songs up into chunks to perform FFTs on, then you can try to find matching FFTs but they will only match perfectly if your chunks are divided exactly along the same division points relative to the beginning of the original recording. And even if you could guarantee that and found matching FFT’s, you will only have as much precision as the size of your chunks.
But when I think of apps like Shazam, I realize they must be doing some sort of manipulation of the audio that breaks it down into something simpler for rapid comparison. That possibly involves some FFT manipulation and filtering.
Maybe you could compare FFTs using some algorithm to just find ones that are pretty similar to narrow down to a time range and then compare wave form data in that range to find the exact point of synchronization.
I would imagine the approach that would work well would to find the offset with the maximum cross-correlation between the two recordings. This means calculate the cross-correlation between the two pieces at various offsets. You would expect the maximum cross-correlation to occur at the offset where the two piece were best aligned.
I have a similar question to Writing an Exodus II file programmatically. The linked question has still no accepted answer.
I am performing numerical simulations in MATLAB. My output consists of polyhedra in 3d space and I know the displacement for every vertex in terms of a displacement vector for a finite number of time steps. I would like to visualize the displacement as an animation in Paraview. On YouTube I found a tutorial on animations for the can.ex2 example in Paraview.
Therefore, I would like to export my data (i.e. initial vertex positions + displacement for each time step) to Exodus 2 format or similar. If possible, I would like to avoid any existing library and write the file myself in MATLAB. I was already successful with .vtk/.vti/.obj/... writers for other parts of my project.
Can someone recommend a comprehensive description on how the .ex2 files should be written and/or code that I can orientate myself on? Unfortunately, I was not very successful with my own research. I am also open for suggestions of similar file formats that would be sufficient for my plans.
Edit
Because I was asked for a code example:
% Vertices of unit triangle contained in (x,y)-plane
initialVertexPos = [0, 1, 0;
0, 0, 1;
0, 0, 0];
nVertices = size(initialVertexPos, 2);
% Linear displacement of all vertices along z-axis
nTimeSteps = 10;
disp_x = zeros(nTimeSteps, nVertices);
disp_y = zeros(nTimeSteps, nVertices);
disp_z = repmat(linspace(0,1,nTimeSteps)', 1, nVertices);
% Position of vertex kVertex at time step kTime is given by
% initialVertexPos(:,kVertex) + [disp_x(kTime,kVertex); disp_y(kTime,kVertex); disp_z(kTime,kVertex)]
I have data regarding the redness of the user's finger that is currently quite noisy, so I'd like to run it through an FFT to reduce the noise. The data on the left side of this image is similar to my data currently. I've familiarized myself with the Apple documentation regarding vDSP, but there doesn't seem to be a clear or concise guide on how to implement a Fast Fourier Transform using Apple's vDSP and the Accelerate framework. How can I do this?
I have already referred to this question, which is on a similar topic, but is significantly outdated and doesn't involve vDSP.
Using vDSP for FFT calculations is pretty easy. I'm assuming you have real values on input. The only thing you need to keep in mind you need to convert your real valued array to a packed complex array that FFT algo from vDSP uses internally.
You can see a good overview in the documentation:
https://developer.apple.com/library/content/documentation/Performance/Conceptual/vDSP_Programming_Guide/UsingFourierTransforms/UsingFourierTransforms.html
Here is the smallest example of calculating real valued FFT:
const int n = 1024;
const int log2n = 10; // 2^10 = 1024
DSPSplitComplex a;
a.realp = new float[n/2];
a.imagp = new float[n/2];
// prepare the fft algo (you want to reuse the setup across fft calculations)
FFTSetup setup = vDSP_create_fftsetup(log2n, kFFTRadix2);
// copy the input to the packed complex array that the fft algo uses
vDSP_ctoz((DSPComplex *) input, 2, &a, 1, n/2);
// calculate the fft
vDSP_fft_zrip(setup, &a, 1, log2n, FFT_FORWARD);
// do something with the complex spectrum
for (size_t i = 0; i < n/2; ++i) {
a.realp[i];
a.imagp[i];
}
One trick is that a.realp[0] is the DC offset and a.imagp[0] is the real valued magnitude at the Nyquist frequency.
Intro
I am trying to render squares in DirectX 11 in the most efficient way. Each square has a color (float3) and a position (float3). Typical count of squares is about 5 millions.
I tried 3 ways:
Render raw data
Use geometry shader
Use instanced rendering
Raw data means, that each square is represented as 4 vertices in vertex buffer and two triangles in index buffer.
Geometry shader and instanced rendering mean, that each square has just one vertex in vertex buffer.
My results (on nvidia GTX960M) for 5M squares are:
Geometry shader 22 FPS
Instanced rendering 30 FPS
Raw data rendering 41 FPS
I expected that geometry shader is not the most efficient method. On the other hand I am surprised that Instanced rendering is slower than raw data. Computation in vertex shader is exactly the same. It is just multiplication with transform matrix stored in constant buffer + addition of Shift variable.
Raw data input
struct VSInput{
float3 Position : POSITION0;
float3 Colot : COLOR0;
float2 Shift : TEXCOORD0;// This is xy deviation from square center
};
Instanced rendering input
struct VSInputPerVertex{
float2 Shift : TEXCOORD0;
};
struct VSInputPerInstance{
float3 Position : POSITION0;
float3 Colot : COLOR0;
};
Note
For bigger models (20M squares) is more efficient instanced rendering (evidently because of memory traffic).
Question
Why is instanced rendering slower (in case of 5M squares), than raw data rendering? Is there another efficient way how to accomplish this rendering task? Am I missing something?
Edit
StrcturedBuffer method
One of possible solutions is to use StructuredBuffer as #galop1n suggested (for details see his answer).
My results (on nvidia GTX960M) for 5M squares
StructuredBuffer 48 FPS
Observations
Sometimes I observed that StructuredBuffer method was oscilating between 30 FPS - 55 FPS (accumulated number from 100 frames). It seems to be little unstable. Median is 48 FPS. I did not observe this using previous methods.
Consider balance between draw calls and StructuredBuffer sizes. I reached the fastest behavior, when I used buffers with 1K - 4K points, for smaller models. When I tried to render 5M square model, I had big number of draw calls and it was not efficient (30 FPS). The best behavior I observe with 5M squares was with 16K points per buffer. 32K and 8K points per buffer seemed to be slower settings.
Small vertex count per instance is usually a good way to underused the hardware. I suggest you that variant, it should provide good performance on every vendors.
VSSetShaderResourceViews(0,1,&quadData);
SetPrimitiveTopology(TRIANGLE);
Draw( 6 * quadCount, 0);
In the vertex shader, you have
struct Quad {
float3 pos;
float3 color;
};
StructuredBuffer<Quad> quads : register(t0);
And to rebuild you quads in the vertex shader :
// shift for each vertex
static const float2 shifts[6] = { float2(-1,-1), ..., float2(1,1) };
void main( uint vtx : SV_VertexID, out YourStuff yourStuff) {
Quad quad = quads[vtx/6];
float2 offs = shifts[vtx%6];
}
Then rebuild the vertex and transform as usual. You have to note, because you bypass the input assembly stage, if you want to send colors as rgba8, you need to use a uint and unpack yourself manually. The bandwidth usage will lower if you have millions of quads to draw.
I am new to GLUT and opengl. I need to draw a scatterplot matrix for n dimensional array.
I have saved the data from csv to a vector of vectors and each vector corresponds to a row. I have plotted just one scatterplot. And used GL_LINES to draw the grid. My questions
1. How do I draw points in a particular grid? Using GL_POINTS I can only draw points in the entire window.
Please let me know need any further info to answer this question
Thanks
What you need to do is be able to transform your data's (x,y) coordinates into screen coordinates. The most straightforward way to do it actually does not rely on OpenGL or GLUT. All you have to do is use a little math. Determine the screen (x,y) coordinates of the place where you want a datapoint for (0,0) to be on the screen, and then determine how far apart you want one increment to be on the screen. Simply take your original data points, apply the offset, and then scale them, to get your screen coordinates, which you then pass into glVertex2f() (or whatever function you are using to specify points in your API).
For instance, you might decide you want point (0,0) in your data to be at location (200,0) on your screen, and the distance between 0 and 1 in your data to be 30 pixels on the screen. This operation will look like this:
int x = 0, y = 0; //Original data points
int scaleX = 30, scaleY = 30; //Scaling values for each component
int offsetX = 100, offsetY = 100; //Where you want the origin of your graph to be
// Apply the scaling values and offsets:
int screenX = x * scaleX + offsetX;
int screenY = y * scaleY + offsetY;
// Calls to your drawing functions using screenX and screenY as your coordinates
You will have to determine values that make sense for the scalaing and offsets. You can also have your program use different values for different sets of data, so you can display multiple graphs on the same screen. But this is a simple way to do it.
There are also other ways you can go about this. OpenGL has very powerful coordinate transformation functions and matrix math capabilities. Those may become more useful when you develop increasingly elaborate programs. They're most useful if you're going to be moving things around the screen in real-time, or operating on incredibly large data sets, as they allow you to perform these mathematical calculations very quickly using your graphics hardware (which is able to do them much faster than the CPU). However, the time it takes for the CPU to do simple calculations like those where you only are going to do them once or very infrequently on limited sets of data is not a problem for computers today.