How seperate y-planar, u-planar and uv-planar from yuv bi planar in ios? - objective-c

In application i used AVCaptureVideo. i got video in kCVPixelFormatType_420YpCbCr8BiPlanarFullRange format.
now i am getting y-planar and uv-planar from imagebuffer.
CVPlanarPixelBufferInfo_YCbCrBiPlanar *planar = CVPixelBufferGetBaseAddress(imageBuffer);
size_t y-offset = NSSwapBigLongToHost(planar->componentInfoY.offset);
size_t uv-offset = NSSwapBigLongToHost(planar->componentInfoCbCr.offset);
here yuv is biplanar format(y+UV).
what is UV-planar? is this uuuu,yyyy format or uvuvuvuv format?
How to i get u-planar and y-planar seperatly?
can any one pls help me?

The Y plane represents the luminance component, and the UV plane represents the Cb and Cr chroma components.
In the case of kCVPixelFormatType_420YpCbCr8BiPlanarFullRange format, you will find the luma plane is 8bpp with the same dimensions as your video, your chroma plane will be 16bpp, but only a quarter of the size of the original video. You will have one Cb and one Cr component per pixel on this plane.
so if your input video is 352x288, your Y plane will be 352x288 8bpp, and your CbCr 176x144 16bpp. This works out to be about the same amount of data as a 12bpp 352x288 image, half what would be required for RGB888 and still less than RGB565.
So in the buffer, Y will look like this
[YYYYY . . . ]
and UV
[UVUVUVUVUV . . .]
vs RGB being, of course,
[RGBRGBRGB . . . ]

Below code copy yuv data from pixelBuffer whose format is kCVPixelFormatType_420YpCbCr8BiPlanarFullRange.
CVPixelBufferLockBaseAddress(pixelBuffer, 0);
size_t pixelWidth = CVPixelBufferGetWidth(pixelBuffer);
size_t pixelHeight = CVPixelBufferGetHeight(pixelBuffer);
// y bite size
size_t y_size = pixelWidth * pixelHeight;
// uv bite size
size_t uv_size = y_size / 2;
uint8_t *yuv_frame = malloc(uv_size + y_size);
// get base address of y
uint8_t *y_frame = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
// copy y data
memcpy(yuv_frame, y_frame, y_size);
// get base address of uv
uint8_t *uv_frame = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
// copy uv data
memcpy(yuv_frame + y_size, uv_frame, uv_size);
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);

Related

How to teleport a player two block on his left?

I tried several things, like using vectors, but It didn't work for me. Than I tried searching on the internet and it didn't work as well.
Vector direc = l.getDirection().normalize();
direc.setY(l.getY());
direc.normalize();
direc.multiply(-1);
l.add(direc);
Player#teleport(l.getBlock().getLocation());
// or
Player#teleport(l);
Use Vector#rotateAroundY​ to rotate the player's direction vector 90 degrees to the left.
Vector dir = player.getLocation().getDirection(); // get player's direction vector
dir.setY(0).normalize(); // get rid of the y component
dir.rotateAroundY(Math.PI / 2); // rotate it 90 degrees to the left
dir.multiply(2); // make the vector's length 2
Location newLocation = player.getLocation().add(dir); // add the vector to the player's location to get the new location
Location location = player.getLocation();
Vector direction = location.getDirection();
direction.normalize();
float newZ = (float)(location.getZ() + (2 * Math.sin(Math.toRadians(location.getYaw() + 90 * direction)))); //2 is your block amount in Z direction
float newX = (float)(location.getX() + (Math.cos(Math.toRadians(location.getYaw() + 90 * direction))));
You have to know in which direction you want to teleport the player

accelerate framework cepstrum peak find

I'm trying to find peak values of cepstrum analysis with accelerate framework. I get peak values always at the end of or at the beginning of frames. I'm analysing it real-time getting audio from microphone. What is wrong with this my code? My code is below :
OSStatus microphoneInputCallback (void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData){
// get reference of test app we need for test app attributes
TestApp *this = (TestApp *)inRefCon;
COMPLEX_SPLIT complexArray = this->fftA;
void *dataBuffer = this->dataBuffer;
float *outputBuffer = this->outputBuffer;
FFTSetup fftSetup = this->fftSetup;
uint32_t log2n = this->fftLog2n;
uint32_t n = this->fftN; // 4096
uint32_t nOver2 = this->fftNOver2;
uint32_t stride = 1;
int bufferCapacity = this->fftBufferCapacity; // 4096
SInt16 index = this->fftIndex;
OSStatus renderErr;
// observation objects
float *observerBufferRef = this->observerBuffer;
int observationCountRef = this->observationCount;
renderErr = AudioUnitRender(rioUnit, ioActionFlags,
inTimeStamp, bus1, inNumberFrames, this->bufferList);
if (renderErr < 0) {
return renderErr;
}
// Fill the buffer with our sampled data. If we fill our buffer, run the
// fft.
int read = bufferCapacity - index;
if (read > inNumberFrames) {
memcpy((SInt16 *)dataBuffer + index, this->bufferList->mBuffers[0].mData, inNumberFrames*sizeof(SInt16));
this->fftIndex += inNumberFrames;
} else {
// If we enter this conditional, our buffer will be filled and we should PERFORM FFT.
memcpy((SInt16 *)dataBuffer + index, this->bufferList->mBuffers[0].mData, read*sizeof(SInt16));
// Reset the index.
this->fftIndex = 0;
/*************** FFT ***************/
//multiply by window
vDSP_vmul((SInt16 *)dataBuffer, 1, this->window, 1, this->outputBuffer, 1, n);
// We want to deal with only floating point values here.
vDSP_vflt16((SInt16 *) dataBuffer, stride, (float *) outputBuffer, stride, bufferCapacity );
/**
Look at the real signal as an interleaved complex vector by casting it.
Then call the transformation function vDSP_ctoz to get a split complex
vector, which for a real signal, divides into an even-odd configuration.
*/
vDSP_ctoz((COMPLEX*)outputBuffer, 2, &complexArray, 1, nOver2);
// Carry out a Forward FFT transform.
vDSP_fft_zrip(fftSetup, &complexArray, stride, log2n, FFT_FORWARD);
vDSP_ztoc(&complexArray, 1, (COMPLEX *)outputBuffer, 2, nOver2);
complexArray.imagp[0] = 0.0f;
vDSP_zvmags(&complexArray, 1, complexArray.realp, 1, nOver2);
bzero(complexArray.imagp, (nOver2) * sizeof(float));
// scale
float scale = 1.0f / (2.0f*(float)n);
vDSP_vsmul(complexArray.realp, 1, &scale, complexArray.realp, 1, nOver2);
// step 2 get log for cepstrum
float *logmag = malloc(sizeof(float)*nOver2);
for (int i=0; i < nOver2; i++)
logmag[i] = logf(sqrtf(complexArray.realp[i]));
// configure float array into acceptable input array format (interleaved)
vDSP_ctoz((COMPLEX*)logmag, 2, &complexArray, 1, nOver2);
// create cepstrum
vDSP_fft_zrip(fftSetup, &complexArray, stride, log2n-1, FFT_INVERSE);
//convert interleaved to real
float *displayData = malloc(sizeof(float)*n);
vDSP_ztoc(&complexArray, 1, (COMPLEX*)displayData, 2, nOver2);
float dominantFrequency = 0;
int currentBin = 0;
float dominantFrequencyAmp = 0;
// find peak of cepstrum
for (int i=0; i < nOver2; i++){
//get current frequency magnitude
if (displayData[i] > dominantFrequencyAmp) {
// DLog("Bufferer filled %f", displayData[i]);
dominantFrequencyAmp = displayData[i];
currentBin = i;
}
}
DLog("currentBin : %i amplitude: %f", currentBin, dominantFrequencyAmp);
}
return noErr;
}
I haven't worked with the Accelerate Framework, but your code appears to be taking the proper steps to calculate the Cepstrum.
The Cepstrum of real acoustic signals tends to have a very large DC component, a large peak at and near zero quefrency [sic]. Just ignore the near-DC portion of the Cepstrum and look for peaks above 20 Hz frequency (above quefrency of Cepstrum_Width/20Hz).
If the input signal contains a series of very closely spaced overtones, the Cepstrum will also have a large peak at the high quefrency end.
For example, the plot below shows the Cepstrum of a Dirichlet Kernel of N=128 and Width=4096, the spectrum of which is a series of very closely spaced overtones.
You may want to use a static synthetic signal to test and debug your code. A good choice for a test signal is any sinusoid with a fundamental F and several overtones at exact integer multiples of F.
Your Cepstra should look something like the following examples.
First a synthetic signal.
The plot below shows the Cepstrum of a synthetic steady-state E2 note, synthesized using a typical near-DC component, a fundamental at 82.4 Hz, and 8 harmonics at integer multiples of 82.4 Hz. The synthetic sinusoid was programmed to generate 4096 samples.
Observe the prominent non-DC peak at 12.36. The Cepstrum width is 1024 (the output of the second FFT), therefore the peak corresponds to 1024/12.36 = 82.8 Hz which is very close to 82.4 Hz the true fundamental frequency.
Now a real acoustical signal.
The plot below shows the Cepstrum of a real acoustic guitar's E2 note. The signal was not windowed prior to the first FFT. Observe the prominent non-DC peak at 542.9. The Cepstrum width is 32768 (the output of the second FFT), therefore the peak corresponds to 32768/542.9 = 60.4 Hz which is fairly far from 82.4 Hz the true fundamental frequency.
The plot below shows the Cepstrum of the same real acoustic guitar's E2 note, but this time the signal was Hann windowed prior to the first FFT. Observe the prominent non-DC peak at 268.46. The Cepstrum width is 32768 (the output of the second FFT), therefore the peak corresponds to 32768/268.46 = 122.1 Hz which is even farther from 82.4 Hz the true fundamental frequency.
The acoustic guitar's E2 note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, no other instruments or voices, and no post processing.
References:
Real audio signal data, synthetic signal generation, plots, FFT, and Cepstral analysis were done here: Musical instrument cepstrum

Calculate ray direction vector from screen coordanate

I'm looking for a better way (or a note that this is the best way) to transfer a pixel coordinate to its corresponding ray direction from a arbitrary camera position/direction.
My current method is as follows. I define a "camera" as a position vector, lookat vector, and up vector, named as such. (Note that the lookat vector is a unit vector in the direction the camera is facing, NOT where (position - lookat) is the direction, as is the standard in XNA's Matrix.CreateLookAt) These three vectors can uniquely define a camera position. Here's the actual code (well, not really the actual, a simplified abstracted version) (Language is HLSL)
float xPixelCoordShifted = (xPixelCoord / screenWidth * 2 - 1) * aspectRatio;
float yPixelCoordShifted = yPixelCoord / screenHeight * 2 - 1;
float3 right = cross(lookat, up);
float3 actualUp = cross(right, lookat);
float3 rightShift = mul(right, xPixelCoordShifted);
float3 upShift = mul(actualUp, yPixelCoordShifted);
return normalize(lookat + rightShift + upShift);
(the return value is the direction of the ray)
So what I'm asking is this- What's a better way to do this, maybe using matrices, etc. The problem with this method is that if you have too wide a viewing angle, the edges of the screen get sort of "radially stretched".
You can calculate it (ray) in pixel shader, HLSL code:
float4x4 WorldViewProjMatrix; // World*View*Proj
float4x4 WorldViewProjMatrixInv; // (World*View*Proj)^(-1)
void VS( float4 vPos : POSITION,
out float4 oPos : POSITION,
out float4 pos : TEXCOORD0 )
{
oPos = mul(vPos, WorldViewProjMatrix);
pos = oPos;
}
float4 PS( float4 pos : TEXCOORD0 )
{
float4 posWS = mul(pos, WorldViewProjMatrixInv);
float3 ray = posWS.xyz / posWS.w;
return float4(0, 0, 0, 1);
}
The information about your camera's position and direction is in View matrix (Matrix.CreateLookAt).

Working out Oscillator wave type code, and creating new wave types

I want to work out a bit of code that generates the oscillator wave-type in my tone generator app. The one in this example is a sine-wave, can someone tell me how the code works, as i want to in the future make custom wave-types and square, sawtooth and triangle types.
OSStatus RenderTone(
void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
// Fixed amplitude is good enough for our purposes
const double amplitude = 0.25;
// Get the tone parameters out of the view controller
ToneGeneratorViewController *viewController =
(ToneGeneratorViewController *)inRefCon;
double theta = viewController->theta;
double theta_increment = 2.0 * M_PI * viewController->frequency / viewController->sampleRate;
// This is a mono tone generator so we only need the first buffer
const int channel = 0;
Float32 *buffer = (Float32 *)ioData->mBuffers[channel].mData;
// Generate the samples
for (UInt32 frame = 0; frame < inNumberFrames; frame++)
{
buffer[frame] = sin(theta) * amplitude;
theta += theta_increment;
if (theta > 2.0 * M_PI)
{
theta -= 2.0 * M_PI;
}
}
// Store the theta back in the view controller
viewController->theta = theta;
return noErr;
}
The actual sine wave samples are being generated and are populating the buffer in the snippet below
for (UInt32 frame = 0; frame < inNumberFrames; frame++)
{
buffer[frame] = sin(theta) * amplitude;
theta += theta_increment;
if (theta > 2.0 * M_PI)
{
theta -= 2.0 * M_PI;
}
}
In the line where buffer[frame] is being assigned, you are calling sin(theta) * amplitude, and for each iteration of the for loop, you are incrementing theta by some finite step size based on your frequency and sample rate, via
double theta_increment = 2.0 * M_PI * viewController->frequency / viewController->sampleRate;
Which is essentially dividing 2.0 * PI * frequency by your sample rate.
Incrementing the theta variable while looping through the for loop is basically advancing the time step one sample at a time until your buffer is full (i.e. frame == iNumberFrames).
If you wanted to generate something other than a sine wave, you would simply replace the following line with some other function:
buffer[frame] = sin(theta) * amplitude;
I.e. let's say, for example, you wanted the first three terms in the infinite Fourier series that converges to a triangle wave; you might then have the following instead...
buffer[frame] = (8 / pow(M_PI,2)) * (sin(theta) - sin(3*theta)/9 + sin(5*theta)/25);
To produce your desired waveform, you need to replace the sin() function with a function that produces your desired wave shape.
You might be able to find this function in a table of functions with graphical examples, or you might have to create your function. The are lots of ways to create a functional approximation, including polynomial, Fourier series, table lookup with or without interpolation, recursions, and etc. But that is a big subject on its own (many textbooks, etc.)

Using CUDA to find the pixel-wise average value of a bunch of images

So I have a cube of images. 512X512X512, I want to sum up the images pixel-wise and save it to a final resulting image. So if all the pixels were value 1...the final image would all be 512. I am having trouble understanding the indexing to do this in CUDA. I figure one thread's job will be to sum up all 512 at it's pixel...so the total thread number will be 512X512. So I plan to do it with 512 blocks, with 512 threads each. From here, I am having trouble coming up with the indexing of how to sum the depth. Any help will be greatly appreciated.
One way to solve this problem is imaging the cube as a set of Z slides. The coordinates X, Y refers to the width and height of the image, and the Z coordinate to each slide in the Z dimension. Each thread will iterate in the Z coordinate to accumulate the values.
With this in mind, configure a kernel to launch a block of 16x16 threads and a grid of enough blocks to process the width and height of the image (I'm assuming a gray scale image with 1 byte per pixel):
#define THREADS 16
// kernel configuration
dim3 dimBlock = dim3 ( THREADS, THREADS, 1 );
dim3 dimGrid = dim3 ( WIDTH / THREADS, HEIGHT / THREADS );
// call the kernel
kernel<<<dimGrid, dimBlock>>>(i_data, o_Data, WIDTH, HEIGHT, DEPTH);
If you are clear how to index a 2D array, loop through the Z dimension would be also clear
__global__ void kernel(unsigned char* i_data, unsigned char* o_data, int WIDTH, int HEIGHT, int DEPTH)
{
// in your kernel map from threadIdx/BlockIdx to pixel position
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
// calculate the global index of a pixel into the image array
// this global index is to the first slide of the cube
int idx = x + y * WIDTH;
// partial results
int r = 0;
// iterate in the Z dimension
for (int z = 0; z < DEPTH; ++z)
{
// WIDTH * HEIGHT is the offset of one slide
int idx_z = z * WIDTH*HEIGHT + idx;
r += i_data[ idx_z ];
}
// o_data is a 2D array, so you can use the global index idx
o_data[ idx ] = r;
}
This is a naive implementation. In order to maximize memory throughput, the data should be properly aligned.
This can be done easily using ArrayFire GPU library ( free). In ArrayFire, you can construct 3D arrays like the following :
Two approaches:
// Method 1:
array data = rand(x,y,z);
// Just reshaping the array, this is a noop
data = newdims(data,x*y, z, 1);
// Sum of pixels
res = sum(data);
// Method 2:
// Use ArrayFire "GFOR"
array data = rand(x,y,z);res = zeros(z,1);
gfor(array i, z) {
res(ii) = sum(data(:,:,i);
}