I am using AVSampleBufferDisplayLayer to display video that is being streamed over the network. On the sending side an AVCaptureSession is used to capture CMSampleBuffers which are serialized into NAL units and streamed to the receiver, which then turns them back into CMSampelBuffers and feeds them to AVSampleBufferDisplayLayer (as is described for instance here). It works quite well - I can see the video and it streams more or less smoothly.
If I set the capture session's sessionPreset to AVCaptureSessionPresetHigh the video shown on the receiving side is cut in half - the top half displays the video from the sender while the bottom half is a solid dark green. If I use any other preset (e.g. AVCaptureSessionPresetMedium or AVCaptureSessionPreset1280x720) the video displays in its entirety.
Has anyone encountered such an issue, or has any idea what might cause it?
I tried examining the data at the source as well as the data at the destination, to see if I can determine where the image is being chopped off, but I have not been successful. It occurred to me that perhaps the high quality frame is being split into more than one NALUs and I am not putting it together correctly - is that possible? How does such splitting look like on the elementary-stream level (if possible at all)?
Thank you
Amos
The problem turned out to be that at the preset AVCaptureSessionPresetHigh one frame would get split into more than one type 5 (or type 1) NALU. On the receiving side I was combining the SPS, PPS and type 1 (or type 5) NAL units into a CMSampleBuffer, but ignoring the second part of the frame if they were split, which caused the problem.
In order to recognize if two successive NAL units belong to the same frame it is necessary to parse the slice header of the picture NAL units. This requires delving into the specification, but goes more or less like this: the first field of the slice header is first_mb_in_slice which is encoded in Golomb encoding. Next come slice_type and the pic_aprameter_set_id, also in Golomb encoding, and finally the frame_number, as an unsigned integer of length (log2_max_frame_num_minus_4 + 4) bits (to get the value of log2_max_frame_num_minus_4 it is necessary to parse the PPS corresponding to this frame). If two consecutive NAL units have the same frame_num they are part of the same frame and should be put into the same CMSampleBuffer.
Related
Using pyiron, I want to calculate the mean square displacement of the ions in my system. How do I see the total displacement (i.e. not folded back by periodic boundary conditions) without dumping very frequently and checking when an atom passes over the boundary and gets wrapped?
Try to compare job['output/generic/unwrapped_positions'][-1] and job.structure.positions+job.output.total_displacements[-1]. If they deliver the same values, it's definitely fine both ways. If not, you can post the relevant lines in your notebook here.
I'd like to add a few comments to Jan's answer:
While job['output/generic/unwrapped_positions'] returns the unwrapped positions parsed from the output files, job.output.total_displacements returns the displacement of atoms calculated from each pair of consecutive snapshots. So if an atom moves more than half the box length in any direction, job.output.total_displacements will give wrong coordinates. Therefore, job['output/generic/unwrapped_positions'] is generally more trustworthy, but it is not available in all the codes (since some codes simply do not provide an output for unwrapped positions).
Moreover, if an interactive job is used, it is possible that job.structure.positions does not return the initial positions, i.e. job.structure.positions+job.output.total_displacements won't be initial positions + displacements.
So, in short, my answer to your question would be rather "Use job['output/generic/unwrapped_positions'] and if it's not available, use job.structure.positions+job.output.total_displacements but be aware of potential problems you might be running into."
This is more of an open discussion topic than anything else. Currently I'm storing 50 Float32 values in my NSMutableArray *voltageArray before I refresh my CPTPlot *plot. Every time I obtain 50 values, I remove the previous 50 from the voltageArray and repeat the process....always displaying the 50 values in "real time" on my plot.
However, the data I'm receiving (which is voltage coming from a Cypress BLE module equipped with a pressure transducer) is so quick that any variation (0.4 V to 4.0 V; no pressure to lots of pressure) cannot be seen on my graph. It just shows up as a straight line, varying up and down without showing increased or decreased slopes.
To show overall change, I wanted to take those 50 values, store them in the first index of another NSMutableArray *stampArray and use the index of stampArray to display information. Meanwhile, the numberOfRecordsForPlot: method would look like this:
- (NSUInteger)numberOfRecordsForPlot:(CPTPlot *)plotnumberOfRecords {
return (DATA_PER_STAMP * _stampCount);
}
This would initially be 50, then after 50 pieces of data are captured from the BLE module, _stampCount would increase by one, and the number of records for plot would increase by 50 (till about 2500-10000 range, then I'd refresh the whole the thing and restart the process.)
Is this the right approach? How would I be able to make the first 50 points stay on the graph, while building the next 50, etc.? Imagine an y = x^2 graph, and what the graph looks like when applying integration (the whole breaking the area under the curve into rectangles).
Look at the "Real Time Plot" demo in the Plot Gallery example app included with Core Plot. It starts off with an empty plot, adding a new point each cycle until reaching the maximum number of points. After that, one old point is removed for each new one added so the total number stays constant. The demo uses a timer to pass random data to the plot, but your app can of course collect data from anywhere. Be sure to always interact with the graph from the main thread.
I doubt you'll be able to display 10,000 data points on one plot (does your display have enough pixels to resolve that many points?). If not, you'll get much better drawing performance if you filter and/or smooth the data to remove some of the points before sending them to the plot.
I'm trying to understand the names of the items in the VkFormat enum, and so far I think I get all the structure of the names of all of the (non-block) formats, but I can't figure out what it means when they have a suffix of PACK8, PACK16, PACK32. If I add up the channel sizes, they always add up to 8, 16, or 32, nothing irregular, so I don't understand what it would mean to bit-pack these values, since they seem to be 100% efficient, using all their bits.
As usual, the documentation is not very helpful, just saying the format is packed without saying what that means.
The PACK fields mean exactly what the specification says they mean:
whole texels or attributes are stored in a single data element, rather than individual components occupying a single data element
Though if you find that too confusing, you could just look at the actual format descriptions. Vulkan goes into excruciating detail about them, to the point of needless repetition.
The difference between VK_FORMAT_B8G8R8A8_RGB and VK_FORMAT_B8G8R8A8_RGB_PACK32 is the same difference between a uint8_t[4] and a uint32_t. One is an array ("individual components"), while the other is a single value ("single data element") made up of smaller values.
If you have a uint8_t color[4] array, which stores B8G8R8A8, then color[0] stores the blue component. The order of the components in the array is defined by the order of the components in the format's name.
If you have a uint32_t color value, which stores B8G8R8A8, then (color & 0xFF000000) >> 24 will retrieve the blue component. The highest byte is the first, followed by the next highest and so forth.
The reason the packed-vs-not-packed distinction matters is because of endian issues. Arrays of bytes don't have endian issues. But values packed into 16 or 32-bits do have endian issues. The endian of the packed formats is always assumed to be the native endian of the host.
I've built an AVI M-jpeg encoder which basically build an AVI Riff header with all the infos.
I'm adding a frame index at the end of the video stream as specified in the specs.
Index is built as follow:
idx1[Size], then 00dc[0x10,0x00,0x00,0x00][Offset from frame X][Size from frame X] until the end. I compared to any other AVI file, and everything is the same. So I can't understand where softwares don't find - or search for - the index in my AVI file. Also verified several time that each tag has the good byte length indicated after. By the way, there is the good padding in each offset, and the length is the size of the jpeg only.
I attached the current rendered file: movie.avi
I spent the whole day trying to figure out what is the problem with my index. AVI spec is really simple, so I'm smashing my head on the desk.
[Edit]
As soon as my video is longer than 1 second, it fails. That makes no sense for me currently as the algorithm is the same, whatever how many frames are written.
Your AVI file violates the alignment rule: every chunk must start at an even byte.
Add a zero byte after every odd-length frame, and update the index accordingly. The chunk size in the header should still be odd to tell the true size of the data, but all offsets should be even.
I would like to know if anyone knows how to perform a cross-correlation between two audio signals on iOS.
I would like to align the FFT windows that I get at the receiver (I am receiving the signal from the mic) with the ones at the transmitter (which is playing the audio track), i.e. make sure that the first sample of each window (besides a "sync" period) at the transmitter will also be the first window at the receiver.
I injected in every chunk of the transmitted audio a known waveform (in the frequency domain). I want estimate the delay through cross-correlation between the known waveform and the received signal (over several consecutive chunks), but I don't know how to do it.
It looks like there is the method vDSP_convD to do it, but I have no idea how to use it and whether I first have to perform the real FFT of the samples (probably yes, because I have to pass double[]).
void vDSP_convD (
const double __vDSP_signal[],
vDSP_Stride __vDSP_signalStride,
const double __vDSP_filter[],
vDSP_Stride __vDSP_strideFilter,
double __vDSP_result[],
vDSP_Stride __vDSP_strideResult,
vDSP_Length __vDSP_lenResult,
vDSP_Length __vDSP_lenFilter
)
The vDSP_convD() function calculates the convolution of the two input vectors to produce a result vector. It’s unlikely that you want to convolve in the frequency domain, since you are looking for a time-domain result — though you might, if you have FFTs already for some other reason, choose to multiply them together rather than convolving the time-domain sequences (but in that case, to get your result, you will need to perform an inverse DFT to get back to the time domain again).
Assuming, of course, I understand you correctly.
Then once you have the result from vDSP_convD(), you would want to look for the highest value, which will tell you where the signals are most strongly correlated. You might also need to cope with the case where the input signal does not contain sufficient of your reference signal, and in that case you may wish to (for example) ignore values in the result vector below a certain level.
Cross-correlation is the solution, yes. But there are many obstacles you need to handle. If you get samples from the audio files, they contain padding which cross-correlation function does not like. It is also very inefficient to perform correlation with all those samples - it takes a huge amount of time. I have made a sample code which demonstrates time shift of two audio files. If you are interested in the sample, look at my Github Project.