Compile error when trying to access StructuredBuffer - compute-shader

I want to access a StructuredBuffer<int>in a compute shader but I get the error:
Shader error in 'Particle.compute': array, matrix, vector, or indexable object type expected in index expression at Particle.compute(28) (on d3d11)
The code:
#pragma kernel CSMain
#include "Assets/Uplus/ZCommon/Resources/ImageProcessing/UplusDirectCompute.cginc"
struct Particle
{
float3 Position;
float Mass;
};
Texture2D<float2> _terTx;
ConsumeStructuredBuffer<Particle> currentBuffer;
AppendStructuredBuffer<Particle> nextBuffer;
StructuredBuffer<int> particleCount;
float3 _terPos;
float _terSize, _terPhysicalScale, _resolution;
SamplerState _LinearClamp;
SamplerState _LinearRepeat;
#define _gpSize 512
[numthreads(_gpSize, 1, 1)]
void CSMain(uint3 dispatchID : SV_DispatchThreadID)
{
int flatID = dispatchID.x;
int particleCount = particleCount[0];
if (flatID >= particleCount) return;
Particle particle = currentBuffer.Consume();
//Commented the rest of code
nextBuffer.Append(particle);
}
The error points the line int particleCount = particleCount[0];. Why is that?
The whole idea behind the shader is we have two buffers. We fill one with some data (we call each of them Particle) from CPU and then in the shader consume the data from the buffer, process it and then append to another buffer. then we swap buffers and do another iteration. The particleCount buffer holds the current count of Particles that the buffer holds and the if clause prevents from consuming more Particles than available.

This is an old question so I assume you solved it, but here is the answer anyway:
You are declaring particleCount as an int when it already is a buffer.
Either change the name to int currentParticleCount = particleCount[0]; or just don't use a temporary variable:
if (flatID >= particleCount[0]) return;

Related

STM32 reading variables out of Received Buffer with variable size

I am not really familiar with programming in STM32. I am using the micro controller STM32F303RE.
I am receiving data via a UART connection with DMA.
Code:
HAL_UARTEx_ReceiveToIdle_DMA(&huart2, RxBuf, RxBuf_SIZE);
__HAL_DMA_DISABLE_IT(&hdma_usart2_rx, DMA_IT_HT);
I am writing the value into a Receiving Buffer and then transfer it into a main buffer. This function and declaration is down before the int main(void).
#define RxBuf_SIZE 100
#define MainBuf_Size 100
uint8_t RxBuf[RxBuf_SIZE];
uint8_t MainBuf[MainBuf_Size];
void HAL_UARTEx_RxEventCallback(UART_HandleTypeDef *huart,uint16_t Size){
if( huart -> Instance == USART2){
memcpy (MainBuf, RxBuf, Size);
HAL_UARTEx_ReceiveToIdle_DMA(&huart2, RxBuf, RxBuf_SIZE);
}
for (int i = 0; i<Size; i++){
if((MainBuf[i] == 'G') && (MainBuf[i+1] == 'O')){
RecieveData();
HAL_UART_DMAStop(&huart2);
}
}
}
I receive know the data into a buffer and it stops as soon as "GO" is transmitted. Until this point it is working. The function ReceiveData() should then transform this buffer to the variables. But it isn't working for me.
Now I want to transform this received data with "breakpoints" into variables.
So I want to send: "S2000S1000S1S10S2GO".
I always have 5 variables. (in this case: 2000, 1000, 1, 10, 2) I want to read the data out of the string and transform it into an uint16_t to procude. The size/ length of the variables could be changed. That's why I tried to use like some breakpoint.

Constrained Delaunay with Info

I have this setup:
typedef CGAL::Exact_predicates_inexact_constructions_kernel K;
typedef CGAL::Triangulation_vertex_base_with_info_2<FIntPoint, K> Vb;
typedef CGAL::Constrained_triangulation_face_base_2<K> Fb;
typedef CGAL::Triangulation_data_structure_2<Vb, Fb> Tds;
typedef CGAL::Exact_predicates_tag Itag;
typedef CGAL::Constrained_Delaunay_triangulation_2<K, Tds, Itag> Delaunay;
typedef Delaunay::Point Point;
typedef Delaunay::Vertex_handle VertexHandle;
FIntPoint is a struct with 2 integers.
My vertices belong to border triangles of 2 separate terrain-like 3D meshes. I triangulate these vertices in 2D to connect these meshes. The info I need for each vertex is (1) which mesh it belongs to and (2) its vertex index within that mesh, so that I know where this vertex came from after triangulation, as more information is tied to these indices.
In this illustration, you can see the 2 meshes and their border triangles. I want to stitch these together:
This is basically how I insert points, simplified:
Delaunay Triangulation;
for (...)
{
// Insert a triangle from one of my meshes
VertexHandle vh1 = Triangulation.insert(Point(a.Y, a.Z));
vh1->info() = FIntPoint(iA, iM);
VertexHandle vh2 = Triangulation.insert(Point(b.Y, b.Z));
vh2->info() = FIntPoint(iB, iM);
VertexHandle vh3 = Triangulation.insert(Point(c.Y, c.Z));
vh3->info() = FIntPoint(iC, iM);
// Add constraints to keep this triangle exactly how it was in the origin mesh
Triangulation.insert_constraint(vh1, vh2);
Triangulation.insert_constraint(vh2, vh3);
Triangulation.insert_constraint(vh3, vh1);
}
My problem now is that when I try to retrieve the info, it is missing/incorrect. I get values outside of the range I'm inserting them.
This happens only with the constraints. When I comment out the lines with insert_constraint, it works as expected.
for (auto itFace = Triangulation.finite_faces_begin(); itFace != Triangulation.finite_faces_end(); itFace++)
{
// Retrieve vertex info
auto& a = itFace->vertex(0)->info();
auto& b = itFace->vertex(1)->info();
auto& c = itFace->vertex(2)->info();
// ...
}
What am I doing wrong with the constraints?
Additionally, I want to limit the length of edges. I haven't attempted so add this limitation in my code yet, because I wanted to fix the described problem first. If you could provide an answer that includes a maximum edge length, that would be very helpful too.
Any help is appreciated. I've been sitting on this problem for a long time.

Gather AVX2&512 intrinsic for 16-bit integers?

Imagine this piece of code:
void Function(int16 *src, int *indices, float *dst, int cnt, float mul)
{
for (int i=0; i<cnt; i++) dst[i] = float(src[indices[i]]) * mul;
};
This really asks for gather intrinsics e.g. _mm_i32gather_epi32. I got great success with these when loading floats, but are there any for 16-bit ints? Another problem here is that I need to transition from 16-bits on the input to 32-bits (float) on the output.
There is indeed no instruction to gather 16bit integers, but (assuming there is no risk of memory-access violation) you can just load 32bit integers starting at the corresponding addresses, and mask out the upper halves of each value.
For uint16_t this would be a simple bit-and, for signed integers you can shift the values to the left in order to have the sign bit at the most-significant position. You can then (arithmetically) shift back the values before converting them to float, or, since you multiply them anyway, just scale the multiplication factor accordingly.
Alternatively, you could load from two bytes earlier and arithmetically shift to the right. Either way, your bottle-neck will likely be the load-ports (vpgatherdd requires 8 load-uops. Together with the load for the indices you have 9 loads distributed on two ports, which should result in 4.5 cycles for 8 elements).
Untested possible AVX2 implementation (does not handle the last elements, if cnt is not a multiple of 8 just execute your original loop at the end):
void Function(int16_t const *src, int const *indices, float *dst, size_t cnt, float mul_)
{
__m256 mul = _mm256_set1_ps(mul_*float(1.0f/0x10000));
for (size_t i=0; i+8<=cnt; i+=8){ // todo handle last elements
// load indicies:
__m256i idx = _mm256_loadu_si256(reinterpret_cast<__m256i const*>(indices + i));
// load 16bit integers in the lower halves + garbage in the upper halves:
__m256i values = _mm256_i32gather_epi32(reinterpret_cast<int const*>(src), idx, 2);
// shift each value to upper half (removes garbage, makes sure sign is at the right place)
// values are too large by a factor of 0x10000
values = _mm256_slli_epi32(values, 16);
// convert to float, scale and multiply:
__m256 fvalues = _mm256_mul_ps(_mm256_cvtepi32_ps(values), mul);
// store result
_mm256_storeu_ps(dst, fvalues);
}
}
Porting this to AVX-512 should be straight-forward.

How can I read individual pixels from a CVPixelBuffer

AVDepthData gives me a CVPixelBuffer of depth data. But I can't find a way to easily access the depth information in this CVPixelBuffer. Is there a simple recipe in Objective-C to do so?
You have to use the CVPixelBuffer APIs to get the right format to access the data via unsafe pointer manipulations. Here is the basic way:
CVPixelBufferRef pixelBuffer = _lastDepthData.depthDataMap;
CVPixelBufferLockBaseAddress(pixelBuffer, 0);
size_t cols = CVPixelBufferGetWidth(pixelBuffer);
size_t rows = CVPixelBufferGetHeight(pixelBuffer);
Float32 *baseAddress = CVPixelBufferGetBaseAddress( pixelBuffer );
// This next step is not necessary, but I include it here for illustration,
// you can get the type of pixel format, and it is associated with a kCVPixelFormatType
// this can tell you what type of data it is e.g. in this case Float32
OSType type = CVPixelBufferGetPixelFormatType( pixelBuffer);
if (type != kCVPixelFormatType_DepthFloat32) {
NSLog(#"Wrong type");
}
// Arbitrary values of x and y to sample
int x = 20; // must be lower that cols
int y = 30; // must be lower than rows
// Get the pixel. You could iterate here of course to get multiple pixels!
int baseAddressIndex = y * (int)cols + x;
const Float32 pixel = baseAddress[baseAddressIndex];
CVPixelBufferUnlockBaseAddress( pixelBuffer, 0 );
Note that the first thing you need to determine is what type of data is in the CVPixelBuffer - if you don't know this then you can use CVPixelBufferGetPixelFormatType() to find out. In this case I am getting depth data at Float32, if you were using another type e.g. Float16, then you would need to replace all occurrences of Float32 with that type.
Note that it's important to lock and unlock the base address using CVPixelBufferLockBaseAddress and CVPixelBufferUnlockBaseAddress.

Objective C variable value not being preserved

I'm doing some audio programming for a client and I've come across an issue which I just don't understand.
I have a render callback which is called repeatedly by CoreAudio. Inside this callback I have the following:
// Get the audio sample data
AudioSampleType *outA = (AudioSampleType *)ioData->mBuffers[0].mData;
Float32 data;
// Loop over the samples
for (UInt32 frame = 0; frame < inNumberFrames; frame++) {
// Convert from SInt16 to Float32 just to prove it's possible
data = (Float32) outA[frame] / (Float32) 32768;
// Convert back to SInt16 to show that everything works as expected
outA[frame] = (SInt16) round(next * 32768);
}
This works as expected which shows there aren't any unexpected rounding errors.
The next thing I want to do is add a small delay. I add a global variable to the class:
i.e. below the #implementation line
Float32 last = 0;
Then I use this variable to get a one frame delay:
// Get the audio sample data
AudioSampleType *outA = (AudioSampleType *)ioData->mBuffers[0].mData;
Float32 data;
Float32 next;
// Loop over the samples
for (UInt32 frame = 0; frame < inNumberFrames; frame++) {
// Convert from SInt16 to Float32 just to prove it's possible
data = (Float32) outA[frame] / (Float32) 32768;
next = last;
last = data;
// Convert back to SInt16 to show that everything works as expected
outA[frame] = (SInt16) round(next * 32768);
}
This time round there's a strange audio distortion on the signal.
I just can't see what I'm doing wrong! Any advice would be greatly appreciated.
It seems that what you've done is introduced an unintentional phaser effect on your audio.
This is because you're only delaying one channel of your audio, so the result is that you have the left channel being delayed one frame behind the right channel. This would result in some odd frequency cancellations / amplifications that would suit your description of "a strange audio distortion".
Try applying the effect to both channels:
AudioSampleType *outA = (AudioSampleType *)ioData->mBuffers[0].mData;
AudioSampleType *outB = (AudioSampleType *)ioData->mBuffers[1].mData;
// apply the same effect to outB as you did to outA
This assumes that you are working with stereo audio (i.e ioData->mNumberBuffers == 2)
As a matter of style, it's (IMO) a bad idea to use a global like your last variable in a render callback. Use the inRefCon to pass in proper context (either as a single variable or as a struct if necessary). This likely isn't related to the problem you're having, though.