I got mesh loaded from .obj file
o Plane_Plane.002
v 1.000000 0.000000 1.000000
v -1.000000 0.000000 1.000000
v 1.000000 0.000000 -1.000000
v -1.000000 0.000000 -1.000000
vt 0.000100 0.000100
vt 0.999900 0.000100
vt 0.999900 0.999900
vt 0.000100 0.999900
vn 0.000000 1.000000 0.000000
usemtl None
s off
f 2/1/1 1/2/1 3/3/1
f 4/4/1 2/1/1 3/3/1
and I create vertex buffer with data order:
PosX,PosY,PosZ,NormX,NormY,NormZ,TexX,TexY
now do I have to generate indices to draw this plane like 0,1,2,0,2,3 or 0,1,2,3,4,5 because I already created 6 vertices in my vertex buffer. I'm really confused here :(
You can use a map with the vertex as the key and the index as value
Starting from int counter = 0, traverse all vertices, if the map already contain this vertex thenindex to this vertex = counter then set map[vertex] = counter++
Otherwise the index = map[vertex]
Of course you will to overload < operator for any type you use for vertex as the map expects its key to be comparable
Here is a sample code of map usage in unifying vertices
#include <iostream>
#include <map>
using namespace std;
struct Point
{
float x;
float y;
float z;
Point()
{
}
Point(float _x, float _y, float _z)
{
x = _x;
y = _y;
z = _z;
}
bool operator<( const Point& p ) const
{
if(x < p.x)
return true;
if(y < p.y)
return true;
if(z < p.z)
return true;
return false;
}
};
void main()
{
Point p[4];
p[0] = Point(0,0,0);
p[1] = Point(1,1,1);
p[2] = Point(0,0,0);
p[3] = Point(1,1,1);
std::map<Point, int> indicesMap;
int counter = 0;
for(int i=0;i<4;i++)
{
if(indicesMap.find(p[i]) == indicesMap.cend()) // new vertex
{
indicesMap[p[i]] = counter++;
}
}
for(int i=0;i<4;i++)
{
std::cout << indicesMap[p[i]] << std::endl;
}
}
The output will be 0101
as p[2] is p[0] and p[3] is p[1]
Related
I have a first version of a function that inverses a matrix of size m and using
magma_dgetrf_gpu and magma_dgetri_gpulike this :
// Inversion
magma_dgetrf_gpu( m, m, d_a, m, piv, &info);
magma_dgetri_gpu( m, d_a, m, piv, dwork, ldwork, &info);
Now, I would like to also inverse but using the Cholesky decomposition. The function looks like the first version one,except the functions used which are :
// Inversion
magma_dpotrf_gpu( MagmaLower, m, d_a, m, &info);
magma_dpotri_gpu( MagmaLower, m, d_a, m, &info);
Here is the entire function that inverses :
// ROUTINE MAGMA TO INVERSE WITH CHOLESKY
void matrix_inverse_magma(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {
// Index for loop and arrays
int i, j, ip, idx;
// Start magma part
magma_int_t m = F_matrix.size();
if (m) {
magma_init (); // initialize Magma
magma_queue_t queue=NULL;
magma_int_t dev=0;
magma_queue_create(dev ,&queue );
double gpu_time , *dwork; // dwork - workspace
magma_int_t ldwork; // size of dwork
magma_int_t *piv, info; // piv - array of indices of inter -
// changed rows; a - mxm matrix
magma_int_t mm=m*m; // size of a, r, c
double *a; // a- mxm matrix on the host
double *d_a; // d_a - mxm matrix a on the device
magma_int_t err;
ldwork = m * magma_get_dgetri_nb( m ); // optimal block size
// allocate matrices
err = magma_dmalloc_cpu( &a , mm ); // host memory for a
// Convert matrix to *a double pointer
for (i = 0; i<m; i++){
for (j = 0; j<m; j++){
idx = i*m + j;
a[idx] = F_matrix[i][j];
}
}
err = magma_dmalloc( &d_a , mm ); // device memory for a
err = magma_dmalloc( &dwork , ldwork );// dev. mem. for ldwork
piv=( magma_int_t *) malloc(m*sizeof(magma_int_t ));// host mem.
magma_dsetmatrix( m, m, a, m, d_a, m, queue); // copy a -> d_a
// Inversion
magma_dpotrf_gpu( MagmaLower, m, d_a, m, &info);
magma_dpotri_gpu( MagmaLower, m, d_a, m, &info);
magma_dgetmatrix( m, m, d_a , m, a, m, queue); // copy d_a ->a
// Save Final matrix
for (i = 0; i<m; i++){
for (j = 0; j<m; j++){
idx = i*m + j;
F_output[i][j] = a[idx];
}
}
free(a); // free host memory
free(piv); // free host memory
magma_free(d_a); // free device memory
magma_queue_destroy(queue); // destroy queue
magma_finalize ();
// End magma part
}
}
Unfortunately, after checking the output data, I have a wrong inversion with my implementation.
I have doubts about the using at this line :
ldwork = m * magma_get_dgetri_nb( m ); // optimal block size
Could anyone see at first sight where the error comes from in my using of dpotrf and dpotri functions (actually magma_dpotrf_gpu and magma_dpotri_gpu) ?
EDIT 1:
following the advice of Damir Tenishev, I put an example of a function that inverses a matrix using LAPACKE :
// LAPACK version
void matrix_inverse_lapack(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {
// Index for loop and arrays
int i, j, ip, idx;
// Size of F_matrix
int N = F_matrix.size();
int *IPIV = new int[N];
// Statement of main array to inverse
double *arr = new double[N*N];
// Output Diagonal block
double *diag = new double[N];
for (i = 0; i<N; i++){
for (j = 0; j<N; j++){
idx = i*N + j;
arr[idx] = F_matrix[i][j];
}
}
// LAPACKE routines
int info1 = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, N, N, arr, N, IPIV);
int info2 = LAPACKE_dgetri(LAPACK_ROW_MAJOR, N, arr, N, IPIV);
for (i = 0; i<N; i++){
for (j = 0; j<N; j++){
idx = i*N + j;
F_output[i][j] = arr[idx];
}
}
delete[] IPIV;
delete[] arr;
}
As you can see, this is a classical version of matrix inversion, which uses LAPACKE_dgetrf and LAPACKE_dgetri
EDIT 2: The MAGMA version is :
// MAGMA version
void matrix_inverse_magma(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {
// Index for loop and arrays
int i, j, ip, idx;
// Start magma part
magma_int_t m = F_matrix.size();
if (m) {
magma_init (); // initialize Magma
magma_queue_t queue=NULL;
magma_int_t dev=0;
magma_queue_create(dev ,&queue );
double gpu_time , *dwork; // dwork - workspace
magma_int_t ldwork; // size of dwork
magma_int_t *piv, info; // piv - array of indices of inter -
// changed rows; a - mxm matrix
magma_int_t mm=m*m; // size of a, r, c
double *a; // a- mxm matrix on the host
double *d_a; // d_a - mxm matrix a on the device
magma_int_t ione = 1;
magma_int_t ISEED [4] = { 0,0,0,1 }; // seed
magma_int_t err;
const double alpha = 1.0; // alpha =1
const double beta = 0.0; // beta=0
ldwork = m * magma_get_dgetri_nb( m ); // optimal block size
// allocate matrices
err = magma_dmalloc_cpu( &a , mm ); // host memory for a
for (i = 0; i<m; i++){
for (j = 0; j<m; j++){
idx = i*m + j;
a[idx] = F_matrix[i][j]
}
}
err = magma_dmalloc( &d_a , mm ); // device memory for a
err = magma_dmalloc( &dwork , ldwork );// dev. mem. for ldwork
piv=( magma_int_t *) malloc(m*sizeof(magma_int_t ));// host mem.
magma_dsetmatrix( m, m, a, m, d_a, m, queue); // copy a -> d_a
// find the inverse matrix: d_a*X=I using the LU factorization
// with partial pivoting and row interchanges computed by
// magma_dgetrf_gpu; row i is interchanged with row piv(i);
// d_a -mxm matrix; d_a is overwritten by the inverse
magma_dgetrf_gpu( m, m, d_a, m, piv, &info);
magma_dgetri_gpu(m, d_a, m, piv, dwork, ldwork, &info);
magma_dgetmatrix( m, m, d_a , m, a, m, queue); // copy d_a ->a
for (i = 0; i<m; i++){
for (j = 0; j<m; j++){
idx = i*m + j;
F_output[i][j] = a[idx];
}
}
free(a); // free host memory
free(piv); // free host memory
magma_free(d_a); // free device memory
magma_queue_destroy(queue); // destroy queue
magma_finalize ();
// End magma part
}
}
As you can see, I have used magma_dgetrf_gpu and magma_dgetri_gpu functions.
Now, I would like to do the same, either with LAPACKE or MAGMA+LAPACK, using dpotrf and dpotri functions. I recall that the matrixes that I inverse are symmetric.
EDIT 3: my attempts come from this documentation link
Especially, see section 4.4.21 magma dpotri - invert a positive definite matrix in double precision, CPU interface on page 325.
A small program below creates a simple tf graph. I need to traverse the graph, printing information about the nodes as I go.
Is it right to assume that every graph has a root (or distinguished node)? I believe this graph has 3 nodes and I've heard that the edges are tensors.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include"tensorflow/c/c_api.h"
TF_Graph* g;
TF_Status* s;
#define CHECK_OK(x) if(TF_OK != TF_GetCode(s))return printf("%s\n",TF_Message(s)),(void*)0
TF_Tensor* FloatTensor2x2(const float* values) {
const int64_t dims[2] = {2, 2};
TF_Tensor* t = TF_AllocateTensor(TF_FLOAT, dims, 2, sizeof(float) * 4);
memcpy(TF_TensorData(t), values, sizeof(float) * 4);
return t;
}
TF_Operation* FloatConst2x2(TF_Graph* graph, TF_Status* s, const float* values, const char* name) {
TF_Tensor* tensor=FloatTensor2x2(values);
TF_OperationDescription* desc = TF_NewOperation(graph, "Const", name);
TF_SetAttrTensor(desc, "value", tensor, s);
if (TF_GetCode(s) != TF_OK) return 0;
TF_SetAttrType(desc, "dtype", TF_FLOAT);
TF_Operation* op = TF_FinishOperation(desc, s);
CHECK_OK(s);
return op;
}
TF_Operation* MatMul(TF_Graph* graph, TF_Status* s, TF_Operation* l, TF_Operation* r, const char* name,
char transpose_a, char transpose_b) {
TF_OperationDescription* desc = TF_NewOperation(graph, "MatMul", name);
if (transpose_a) {
TF_SetAttrBool(desc, "transpose_a", 1);
}
if (transpose_b) {
TF_SetAttrBool(desc, "transpose_b", 1);
}
TF_AddInput(desc,(TF_Output){l, 0});
TF_AddInput(desc,(TF_Output){r, 0});
TF_Operation* op = TF_FinishOperation(desc, s);
CHECK_OK(s);
return op;
}
TF_Graph* BuildSuccessGraph(TF_Output* inputs, TF_Output* outputs) {
// |
// z|
// |
// MatMul
// / \
// ^ ^
// | |
// x Const_0 y Const_1
//
float const0_val[] = {1.0, 2.0, 3.0, 4.0};
float const1_val[] = {1.0, 0.0, 0.0, 1.0};
TF_Operation* const0 = FloatConst2x2(g, s, const0_val, "Const_0");
TF_Operation* const1 = FloatConst2x2(g, s, const1_val, "Const_1");
TF_Operation* matmul = MatMul(g, s, const0, const1, "MatMul",0,0);
inputs[0] = (TF_Output){const0, 0};
inputs[1] = (TF_Output){const1, 0};
outputs[0] = (TF_Output){matmul, 0};
CHECK_OK(s);
return g;
}
int main(int argc, char const *argv[]) {
g = TF_NewGraph();
s = TF_NewStatus();
TF_Output inputs[2],outputs[1];
BuildSuccessGraph(inputs,outputs);
/* HERE traverse g -- maybe with {inputs,outputs} -- to print the graph */
fprintf(stdout, "OK\n");
}
If someone could help with what functions to use to get info about the graph, it would be appreciated.
from c_api.h:
// Iterate through the operations of a graph. To use:
// size_t pos = 0;
// TF_Operation* oper;
// while ((oper = TF_GraphNextOperation(graph, &pos)) != nullptr) {
// DoSomethingWithOperation(oper);
// }
TF_CAPI_EXPORT extern TF_Operation* TF_GraphNextOperation(TF_Graph* graph,
size_t* pos);
Note this only returns operations and does not define a way to navigate from one node (Operation) to the next - this edge relationship is stored in the nodes themselves (as pointers).
I'm trying to compute batch 1D FFTs using cufftPlanMany. The data set comes from a 3D field, stored in a 1D array, where I want to compute 1D FFTs in the x and y direction. The data is stored as shown in the figure below; continuous in x then y then z.
Doing batch FFTs in the x-direction is (I believe) straighforward; with input stride=1, distance=nx and batch=ny * nz, it computes the FFTs over elements {0,1,2,3}, {4,5,6,7}, ..., {28,29,30,31}. However, I can't think of a way to achieve the same for the FFTs in the y-direction. A batch for each xy plane is again straightforward (input stride=nx, dist=1, batch=nx results in FFTs over {0,4,8,12}, {1,5,9,13}, etc.). But with batch=nx * nz, going from {3,7,11,15} to {16,20,24,28}, the distance is larger than 1. Can this somehow be done with cufftPlanMany?
I think that the short answer to your question (possibility of using a single cufftPlanMany to perform 1D FFTs of the columns of a 3D matrix) is NO.
Indeed, transformations performed according to cufftPlanMany, that you call like
cufftPlanMany(&handle, rank, n,
inembed, istride, idist,
onembed, ostride, odist, CUFFT_C2C, batch);
must obey the Advanced Data Layout. In particular, 1D FFTs are worked out according to the following layout
input[b * idist + x * istride]
where b addresses the b-th signal and istride is the distance between two consecutive items in the same signal. If the 3D matrix has dimensions M * N * Q and if you want to perform 1D transforms along the columns, then the distance between two consecutive elements will be M, while the distance between two consecutive signals will be 1. Furthermore, the number of batched executions must be set equal to M. With those parameters, you are able to cover only one slice of the 3D matrix. Indeed, if you try increasing M, then the cuFFT will start trying to compute new column-wise FFTs starting from the second row. The only solution to this problem is an iterative call to cufftExecC2C to cover all the Q slices.
For the record, the following code provides a fully worked example on how performing 1D FFTs of the columns of a 3D matrix.
#include <thrust/device_vector.h>
#include <cufft.h>
/********************/
/* CUDA ERROR CHECK */
/********************/
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
int main() {
const int M = 3;
const int N = 4;
const int Q = 2;
thrust::host_vector<float2> h_matrix(M * N * Q);
for (int k=0; k<Q; k++)
for (int j=0; j<N; j++)
for (int i=0; i<M; i++) {
float2 temp;
temp.x = (float)(j + k * M);
//temp.x = 1.f;
temp.y = 0.f;
h_matrix[k*M*N+j*M+i] = temp;
printf("%i %i %i %f %f\n", i, j, k, temp.x, temp.y);
}
printf("\n");
thrust::device_vector<float2> d_matrix(h_matrix);
thrust::device_vector<float2> d_matrix_out(M * N * Q);
// --- Advanced data layout
// input[b * idist + x * istride]
// output[b * odist + x * ostride]
// b = signal number
// x = element of the b-th signal
cufftHandle handle;
int rank = 1; // --- 1D FFTs
int n[] = { N }; // --- Size of the Fourier transform
int istride = M, ostride = M; // --- Distance between two successive input/output elements
int idist = 1, odist = 1; // --- Distance between batches
int inembed[] = { 0 }; // --- Input size with pitch (ignored for 1D transforms)
int onembed[] = { 0 }; // --- Output size with pitch (ignored for 1D transforms)
int batch = M; // --- Number of batched executions
cufftPlanMany(&handle, rank, n,
inembed, istride, idist,
onembed, ostride, odist, CUFFT_C2C, batch);
for (int k=0; k<Q; k++)
cufftExecC2C(handle, (cufftComplex*)(thrust::raw_pointer_cast(d_matrix.data()) + k * M * N), (cufftComplex*)(thrust::raw_pointer_cast(d_matrix_out.data()) + k * M * N), CUFFT_FORWARD);
cufftDestroy(handle);
for (int k=0; k<Q; k++)
for (int j=0; j<N; j++)
for (int i=0; i<M; i++) {
float2 temp = d_matrix_out[k*M*N+j*M+i];
printf("%i %i %i %f %f\n", i, j, k, temp.x, temp.y);
}
}
The situation is different for the case when you want to perform 1D transforms of the rows. In that case, the distance between two consecutive elements is 1, while the distance between two consecutive signals is M. This allows you to set a number of N * Q transformations and then invoking cufftExecC2C only one time. For the record, the code below provides a full example of 1D transformations of the rows of a 3D matrix.
#include <thrust/device_vector.h>
#include <cufft.h>
/********************/
/* CUDA ERROR CHECK */
/********************/
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
int main() {
const int M = 3;
const int N = 4;
const int Q = 2;
thrust::host_vector<float2> h_matrix(M * N * Q);
for (int k=0; k<Q; k++)
for (int j=0; j<N; j++)
for (int i=0; i<M; i++) {
float2 temp;
temp.x = (float)(j + k * M);
//temp.x = 1.f;
temp.y = 0.f;
h_matrix[k*M*N+j*M+i] = temp;
printf("%i %i %i %f %f\n", i, j, k, temp.x, temp.y);
}
printf("\n");
thrust::device_vector<float2> d_matrix(h_matrix);
thrust::device_vector<float2> d_matrix_out(M * N * Q);
// --- Advanced data layout
// input[b * idist + x * istride]
// output[b * odist + x * ostride]
// b = signal number
// x = element of the b-th signal
cufftHandle handle;
int rank = 1; // --- 1D FFTs
int n[] = { M }; // --- Size of the Fourier transform
int istride = 1, ostride = 1; // --- Distance between two successive input/output elements
int idist = M, odist = M; // --- Distance between batches
int inembed[] = { 0 }; // --- Input size with pitch (ignored for 1D transforms)
int onembed[] = { 0 }; // --- Output size with pitch (ignored for 1D transforms)
int batch = N * Q; // --- Number of batched executions
cufftPlanMany(&handle, rank, n,
inembed, istride, idist,
onembed, ostride, odist, CUFFT_C2C, batch);
cufftExecC2C(handle, (cufftComplex*)(thrust::raw_pointer_cast(d_matrix.data())), (cufftComplex*)(thrust::raw_pointer_cast(d_matrix_out.data())), CUFFT_FORWARD);
cufftDestroy(handle);
for (int k=0; k<Q; k++)
for (int j=0; j<N; j++)
for (int i=0; i<M; i++) {
float2 temp = d_matrix_out[k*M*N+j*M+i];
printf("%i %i %i %f %f\n", i, j, k, temp.x, temp.y);
}
}
I guess, idist=nx*nz could also jump a whole plane and batch=nz would then cover one yx plane. The decision should be made according to whether nx or nz is larger.
Translating Obj-C to Swift. As you can see I declared let buf = UnsafeMutablePointer<UInt8>(CVPixelBufferGetBaseAddress(cvimgRef)) so I'm getting the error in the for loop below it.
Binary operator '+=' cannot be applied to operands of type 'Int' and 'UInt8'
Also as a little addendum I don't know how to translate the remaining Obj-C code below the for loop. What does that slash mean and how do I deal with the pointer? I have to say UnsafeMutableFloat somewhere?
// process the frame of video
func captureOutput(captureOutput:AVCaptureOutput, didOutputSampleBuffer sampleBuffer:CMSampleBuffer, fromConnection connection:AVCaptureConnection) {
// if we're paused don't do anything
if currentState == CurrentState.statePaused {
// reset our frame counter
self.validFrameCounter = 0
return
}
// this is the image buffer
var cvimgRef:CVImageBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)
// Lock the image buffer
CVPixelBufferLockBaseAddress(cvimgRef, 0)
// access the data
var width: size_t = CVPixelBufferGetWidth(cvimgRef)
var height:size_t = CVPixelBufferGetHeight(cvimgRef)
// get the raw image bytes
let buf = UnsafeMutablePointer<UInt8>(CVPixelBufferGetBaseAddress(cvimgRef))
var bprow: size_t = CVPixelBufferGetBytesPerRow(cvimgRef)
var r = 0
var g = 0
var b = 0
for var y = 0; y < height; y++ {
for var x = 0; x < width * 4; x += 4 {
b += buf[x]; g += buf[x + 1]; r += buf[x + 2] // error
}
buf += bprow() // error
}
Remaining Obj-C code.
r/=255*(float) (width*height);
g/=255*(float) (width*height);
b/=255*(float) (width*height);
You have a lot of type mismatch error.
The type of x should not be UInt8 because x to increase until the value of the width.
for var x:UInt8 = 0; x < width * 4; x += 4 { // error: '<' cannot be applied to operands of type 'UInt8' and 'Int'
So fix it like below:
for var x = 0; x < width * 4; x += 4 {
To increment the pointer address, you can use advancedBy() function.
buf += bprow(UnsafeMutablePointer(UInt8)) // error: '+=' cannot be applied to operands of type 'UnsafeMutablePointer<UInt8>' and 'size_t'
Like below:
var pixel = buf.advancedBy(y * bprow)
And this line,
RGBtoHSV(r, g, b) // error
There are no implicit casts in Swift between CGFloat and Float unfortunately. So you should cast explicitly to CGFloat.
RGBtoHSV(CGFloat(r), g: CGFloat(g), b: CGFloat(b))
The whole edited code is here:
func RGBtoHSV(r: CGFloat, g: CGFloat, b: CGFloat) -> (h: CGFloat, s: CGFloat, v: CGFloat) {
var h: CGFloat = 0.0
var s: CGFloat = 0.0
var v: CGFloat = 0.0
let col = UIColor(red: r, green: g, blue: b, alpha: 1.0)
col.getHue(&h, saturation: &s, brightness: &v, alpha: nil)
return (h, s, v)
}
// process the frame of video
func captureOutput(captureOutput:AVCaptureOutput, didOutputSampleBuffer sampleBuffer:CMSampleBuffer, fromConnection connection:AVCaptureConnection) {
// if we're paused don't do anything
if currentState == CurrentState.statePaused {
// reset our frame counter
self.validFrameCounter = 0
return
}
// this is the image buffer
var cvimgRef = CMSampleBufferGetImageBuffer(sampleBuffer)
// Lock the image buffer
CVPixelBufferLockBaseAddress(cvimgRef, 0)
// access the data
var width = CVPixelBufferGetWidth(cvimgRef)
var height = CVPixelBufferGetHeight(cvimgRef)
// get the raw image bytes
let buf = UnsafeMutablePointer<UInt8>(CVPixelBufferGetBaseAddress(cvimgRef))
var bprow = CVPixelBufferGetBytesPerRow(cvimgRef)
var r: Float = 0.0
var g: Float = 0.0
var b: Float = 0.0
for var y = 0; y < height; y++ {
var pixel = buf.advancedBy(y * bprow)
for var x = 0; x < width * 4; x += 4 { // error: '<' cannot be applied to operands of type 'UInt8' and 'Int'
b += Float(pixel[x])
g += Float(pixel[x + 1])
r += Float(pixel[x + 2])
}
}
r /= 255 * Float(width * height)
g /= 255 * Float(width * height)
b /= 255 * Float(width * height)
//}
// convert from rgb to hsv colourspace
var h: Float = 0.0
var s: Float = 0.0
var v: Float = 0.0
RGBtoHSV(CGFloat(r), g: CGFloat(g), b: CGFloat(b)) // error
}
Important update: I already figured out the answers and put them in this simple open-source library: http://bartolsthoorn.github.com/NVDSP/ Check it out, it will probably save you quite some time if you're having trouble with audio filters in IOS!
^
I have created a (realtime) audio buffer (float *data) that holds a few sin(theta) waves with different frequencies.
The code below shows how I created my buffer, and I've tried to do a bandpass filter but it just turns the signals to noise/blips:
// Multiple signal generator
__block float *phases = nil;
[audioManager setOutputBlock:^(float *data, UInt32 numFrames, UInt32 numChannels)
{
float samplingRate = audioManager.samplingRate;
NSUInteger activeSignalCount = [tones count];
// Initialize phases
if (phases == nil) {
phases = new float[10];
for(int z = 0; z <= 10; z++) {
phases[z] = 0.0;
}
}
// Multiple signals
NSEnumerator * enumerator = [tones objectEnumerator];
id frequency;
UInt32 c = 0;
while(frequency = [enumerator nextObject])
{
for (int i=0; i < numFrames; ++i)
{
for (int iChannel = 0; iChannel < numChannels; ++iChannel)
{
float theta = phases[c] * M_PI * 2;
if (c == 0) {
data[i*numChannels + iChannel] = sin(theta);
} else {
data[i*numChannels + iChannel] = data[i*numChannels + iChannel] + sin(theta);
}
}
phases[c] += 1.0 / (samplingRate / [frequency floatValue]);
if (phases[c] > 1.0) phases[c] = -1;
}
c++;
}
// Normalize data with active signal count
float signalMulti = 1.0 / (float(activeSignalCount) * (sqrt(2.0)));
vDSP_vsmul(data, 1, &signalMulti, data, 1, numFrames*numChannels);
// Apply master volume
float volume = masterVolumeSlider.value;
vDSP_vsmul(data, 1, &volume, data, 1, numFrames*numChannels);
if (fxSwitch.isOn) {
// H(s) = (s/Q) / (s^2 + s/Q + 1)
// http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt
// BW 2.0 Q 0.667
// http://www.rane.com/note170.html
//The order of the coefficients are, B1, B2, A1, A2, B0.
float Fs = samplingRate;
float omega = 2*M_PI*Fs; // w0 = 2*pi*f0/Fs
float Q = 0.50f;
float alpha = sin(omega)/(2*Q); // sin(w0)/(2*Q)
// Through H
for (int i=0; i < numFrames; ++i)
{
for (int iChannel = 0; iChannel < numChannels; ++iChannel)
{
data[i*numChannels + iChannel] = (data[i*numChannels + iChannel]/Q) / (pow(data[i*numChannels + iChannel],2) + data[i*numChannels + iChannel]/Q + 1);
}
}
float b0 = alpha;
float b1 = 0;
float b2 = -alpha;
float a0 = 1 + alpha;
float a1 = -2*cos(omega);
float a2 = 1 - alpha;
float *coefficients = (float *) calloc(5, sizeof(float));
coefficients[0] = b1;
coefficients[1] = b2;
coefficients[2] = a1;
coefficients[3] = a2;
coefficients[3] = b0;
vDSP_deq22(data, 2, coefficients, data, 2, numFrames);
free(coefficients);
}
// Measure dB
[self measureDB:data:numFrames:numChannels];
}];
My aim is to make a 10-band EQ for this buffer, using vDSP_deq22, the syntax of the method is:
vDSP_deq22(<float *vDSP_A>, <vDSP_Stride vDSP_I>, <float *vDSP_B>, <float *vDSP_C>, <vDSP_Stride vDSP_K>, <vDSP_Length __vDSP_N>)
See: http://developer.apple.com/library/mac/#documentation/Accelerate/Reference/vDSPRef/Reference/reference.html#//apple_ref/doc/c_ref/vDSP_deq22
Arguments:
float *vDSP_A is the input data
float *vDSP_B are 5 filter coefficients
float *vDSP_C is the output data
I have to make 10 filters (10 times vDSP_deq22). Then I set the gain for every band and combine them back together. But what coefficients do I feed every filter? I know vDSP_deq22 is a 2nd order (butterworth) IIR filter, but how do I turn this into a bandpass?
Now I have three questions:
a) Do I have to de-interleave and interleave the audio buffer? I know setting stride to 2 just filters on channel but how I filter the other, stride 1 will process both channels as one.
b) Do I have to transform/process the buffer before it enters the vDSP_deq22 method? If so, do I also have to transform it back to normal?
c) What values of the coefficients should I set to the 10 vDSP_deq22s?
I've been trying for days now but I haven't been able to figure this on out, please help me out!
Your omega value need to be normalised, i.e. expressed as a fraction of Fs - it looks like you left out the f0 when you calculated omega, which will make alpha wrong too:
float omega = 2*M_PI*Fs; // w0 = 2*pi*f0/Fs
should probably be:
float omega = 2*M_PI*f0/Fs; // w0 = 2*pi*f0/Fs
where f0 is the centre frequency in Hz.
For your 10 band equaliser you'll need to pick 10 values of f0, spaced logarithmically, e.g. 25 Hz, 50 Hz, 100 Hz, 200 Hz, 400 Hz, 800 Hz, 1.6 kHz, 3.2 kHz, 6.4 kHz, 12.8 kHz.