HEVC: Picture split up in multiple NAL units - hevc

During some development work regarding HEVC I encountered a camera that streams its pictures in multiple NAL units. When analyzing the bitstream I noticed that there are 3 NAL units per picture. I noticed this with the first_slice_segment_in_pic_flag = 1 for the first NAL unit and first_slice_segment_in_pic_flag = 0 for the 2 following NAL units.
Is there any way of knowing how many NAL units make up a full picture or which NAL unit is the final one of the current picture without looking at the 4th NAL unit (the first one of the next picture)?

No its not possible. You get to know end of the picture after decoding last CTU of the picture or when you get the first slice of the next picture.

Related

Punctured convolutional codes in GNU Radio

I have been able to successfully decode CCSDS punctured convolutional codes in GNU Radio. However, the decoding process has involved some peculiar characteristics that I would like to understand more.
First of all, the CCSDS puncture matrices are shown in the figure below.
The puncture/depuncture blocks in GNU Radio expects a puncture pattern and a puncture size, shown in the table below for the different rates.
At the beginning, the flowgraph was not able to decode the convolutionally encoded bitstream, until I delay the bitstream (circled in RED). The delay values that work for the respective puncture rates are shown on the right-most column of the table above. The delay values show some periodicity, which for some reason turns out to be the denominator of the puncture rate e.g. for rate 2/3, the delay values; 2,5,8,11,14; have a period of 3, which is denom(2/3). The same conclusion can be followed for all the other rates.
I would like to understand why does this happen. Why dont the depuncturer/cc_decoder just work without the delay?
Regards,
M.

How to slow down a file source in GNU Radio?

I'm attempting to unpack bytes from an input file in GNU Radio Companion into a binary bitstream. My problem is that the Unpack K Bits block works at the same sample rate as the file source. So by the time the first bit of byte 1 is clocked out, byte 2 has already been loaded. How do I either slow down the file source or speed up the Unpack K Bits block? Is there a way I can tell GNU Radio Companion to repeat each byte from the file source 8 times?
Note that "after pack" is displaying 4 times as much data as "before pack".
My problem is that the Unpack K Bits block works at the same sample rate as the file source
No it doesn't. Unpack K Bits is an interpolator block. In your case the interpolation is 8. For every bytes 8 new bytes are produced.
The result is right, but the time scale of your sink is wrong. You have to change the sampling rate at the second GUI Time Sink to fit the true sampling rate of the flowgraph after the Unpack K Bits.
So instead of 32e3 it should be 8*32e3.
Manos' answer is very good, but I want to add to this:
This is a common misunderstanding for people that just got in touch with doing digital signal processing down at the sample layer:
GNU Radio doesn't have a notion of sampling rate itself. The term sampling rate is only used by certain blocks to e.g. calculate the period of a sine (in the case of the signal source: Period = f_signal/f_sample), or to calculate times or frequencies that are written on display axes (like in your case).
"Slowing down" means "making the computer process samples slower", but doesn't change the signal.
All you need to do is match what you want the displaying sink to show as time units with what you configure it to do.

What is the name of HEVC input frame?

I am working on rate control algorithms in HEVC. I have a problem. Can any one help me with how can I find each frame of an input video in HEVC? (Because I need to work on each frame as a distinct image but I don't know in what variable in HEVC each input frame is stored an processed.)
To find new frame
Parse the input data as per HEVC draft (annex B) to find start of frame (start_code_prefix_one_3bytes ). This will give you new NAL units.
If you find VCL-NAL parse slice header and find value of first_slice_segment_in_pic_flag equal to 1 specifies that the slice segment is the first slice segment of the picture in decoding order.
Do same process until you will get next vcl nal containing first_slice_segment_in_pic_flag equal to 1. So this will indicate you this is the start of new or next frame.

how to calculate how much data can be embeded into an image

I want to know how much data can be embedded into an image of different sizes.
For example in 30kb image file how much data can be stored without distortion of the image.
it depends on the image type , algoridum , if i take a example as a 24bitmap image to store ASCII character
To store a one ASCII Character = Number of Pixels / 8 (one ASCII = 8bits )
It depends on two points:
How much bits per pixel in your image.
How much bits you will embed in one pixel .
O.K lets suppose that your color model is RGB and each pixel = 8*3 bits (one byte for each color), and you want embed 3 bits in one pixel.
data that can be embedded into an image = (number of pixels * 3) bits
If you would use the LSB to hide your information this would give 30000Bits of available space to use. 3750 bytes.
As the LSB represents 1 or 0 into a byte that gets values from 0-256 this gives you in the worst case scenario that you are going to modify all the LSBs distortion of 1/256 that equals 0,4%.
In the statistical average scenario you would get 0,2% distortion.
So depends on which bit of the byte you are going to change.

VB FFT - stuck understanding relationship of results to frequency

Trying to understand an fft (Fast Fourier Transform) routine I'm using (stealing)(recycling)
Input is an array of 512 data points which are a sample waveform.
Test data is generated into this array. fft transforms this array into frequency domain.
Trying to understand relationship between freq, period, sample rate and position in fft array. I'll illustrate with examples:
========================================
Sample rate is 1000 samples/s.
Generate a set of samples at 10Hz.
Input array has peak values at arr(28), arr(128), arr(228) ...
period = 100 sample points
peak value in fft array is at index 6 (excluding a huge value at 0)
========================================
Sample rate is 8000 samples/s
Generate set of samples at 440Hz
Input array peak values include arr(7), arr(25), arr(43), arr(61) ...
period = 18 sample points
peak value in fft array is at index 29 (excluding a huge value at 0)
========================================
How do I relate the index of the peak in the fft array to frequency ?
If you ignore the imaginary part, the frequency distribution is linear across bins:
Frequency#i = (Sampling rate/2)*(i/Nbins).
So for your first example, assumming you had 256 bins, the largest bin corresponds to a frequency of 1000/2 * 6/256 = 11.7 Hz.
Since your input was 10Hz, I'd guess that bin 5 (9.7Hz) also had a big component.
To get better accuracy, you need to take more samples, to get smaller bins.
Your second example gives 8000/2*29/256 = 453Hz. Again, close, but you need more bins.
Your resolution here is only 4000/256 = 15.6Hz.
It would be helpful if you were to provide your sample dataset.
My guess would be that you have what are called sampling artifacts. The strong signal at DC ( frequency 0 ) suggests that this is the case.
You should always ensure that the average value in your input data is zero - find the average and subtract it from each sample point before invoking the fft is good practice.
Along the same lines, you have to be careful about the sampling window artifact. It is important that the first and last data point are close to zero because otherwise the "step" from outside to inside the sampling window has the effect of injecting a whole lot of energy at different frequencies.
The bottom line is that doing an fft analysis requires more care than simply recycling a fft routine found somewhere.
Here are the first 100 sample points of a 10Hz signal as described in the question, massaged to avoid sampling artifacts
> sinx[1:100]
[1] 0.000000e+00 6.279052e-02 1.253332e-01 1.873813e-01 2.486899e-01 3.090170e-01 3.681246e-01 4.257793e-01 4.817537e-01 5.358268e-01
[11] 5.877853e-01 6.374240e-01 6.845471e-01 7.289686e-01 7.705132e-01 8.090170e-01 8.443279e-01 8.763067e-01 9.048271e-01 9.297765e-01
[21] 9.510565e-01 9.685832e-01 9.822873e-01 9.921147e-01 9.980267e-01 1.000000e+00 9.980267e-01 9.921147e-01 9.822873e-01 9.685832e-01
[31] 9.510565e-01 9.297765e-01 9.048271e-01 8.763067e-01 8.443279e-01 8.090170e-01 7.705132e-01 7.289686e-01 6.845471e-01 6.374240e-01
[41] 5.877853e-01 5.358268e-01 4.817537e-01 4.257793e-01 3.681246e-01 3.090170e-01 2.486899e-01 1.873813e-01 1.253332e-01 6.279052e-02
[51] -2.542075e-15 -6.279052e-02 -1.253332e-01 -1.873813e-01 -2.486899e-01 -3.090170e-01 -3.681246e-01 -4.257793e-01 -4.817537e-01 -5.358268e-01
[61] -5.877853e-01 -6.374240e-01 -6.845471e-01 -7.289686e-01 -7.705132e-01 -8.090170e-01 -8.443279e-01 -8.763067e-01 -9.048271e-01 -9.297765e-01
[71] -9.510565e-01 -9.685832e-01 -9.822873e-01 -9.921147e-01 -9.980267e-01 -1.000000e+00 -9.980267e-01 -9.921147e-01 -9.822873e-01 -9.685832e-01
[81] -9.510565e-01 -9.297765e-01 -9.048271e-01 -8.763067e-01 -8.443279e-01 -8.090170e-01 -7.705132e-01 -7.289686e-01 -6.845471e-01 -6.374240e-01
[91] -5.877853e-01 -5.358268e-01 -4.817537e-01 -4.257793e-01 -3.681246e-01 -3.090170e-01 -2.486899e-01 -1.873813e-01 -1.253332e-01 -6.279052e-02
And here is the resulting absolute values of the fft frequency domain
[1] 7.160038e-13 1.008741e-01 2.080408e-01 3.291725e-01 4.753899e-01 6.653660e-01 9.352601e-01 1.368212e+00 2.211653e+00 4.691243e+00 5.001674e+02
[12] 5.293086e+00 2.742218e+00 1.891330e+00 1.462830e+00 1.203175e+00 1.028079e+00 9.014559e-01 8.052577e-01 7.294489e-01
I'm a little rusty too on math and signal processing but with the additional info I can give it a shot.
If you want to know the signal energy per bin you need the magnitude of the complex output. So just looking at the real output is not enough. Even when the input is only real numbers. For every bin the magnitude of the output is sqrt(real^2 + imag^2), just like pythagoras :-)
bins 0 to 449 are positive frequencies from 0 Hz to 500 Hz. bins 500 to 1000 are negative frequencies and should be the same as the positive for a real signal. If you process one buffer every second frequencies and array indices line up nicely. So the peak at index 6 corresponds with 6Hz so that's a bit strange. This might be because you're only looking at the real output data and the real and imaginary data combine to give an expected peak at index 10. The frequencies should map linearly to the bins.
The peaks at 0 indicates a DC offset.
It's been some time since I've done FFT's but here's what I remember
FFT usually takes complex numbers as input and output. So I'm not really sure how the real and imaginary part of the input and output map to the arrays.
I don't really understand what you're doing. In the first example you say you process sample buffers at 10Hz for a sample rate of 1000 Hz? So you should have 10 buffers per second with 100 samples each. I don't get how your input array can be at least 228 samples long.
Usually the first half of the output buffer are frequency bins from 0 frequency (=dc offset) to 1/2 sample rate. and the 2nd half are negative frequencies. if your input is only real data with 0 for the imaginary signal positive and negative frequencies are the same. The relationship of real/imaginary signal on the output contains phase information from your input signal.
The frequency for bin i is i * (samplerate / n), where n is the number of samples in the FFT's input window.
If you're handling audio, since pitch is proportional to log of frequency, the pitch resolution of the bins increases as the frequency does -- it's hard to resolve low frequency signals accurately. To do so you need to use larger FFT windows, which reduces time resolution. There is a tradeoff of frequency against time resolution for a given sample rate.
You mention a bin with a large value at 0 -- this is the bin with frequency 0, i.e. the DC component. If this is large, then presumably your values are generally positive. Bin n/2 (in your case 256) is the Nyquist frequency, half the sample rate, which is the highest frequency that can be resolved in the sampled signal at this rate.
If the signal is real, then bins n/2+1 to n-1 will contain the complex conjugates of bins n/2-1 to 1, respectively. The DC value only appears once.
The samples are, as others have said, equally spaced in the frequency domain (not logarithmic).
For example 1, you should get this:
alt text http://home.comcast.net/~kootsoop/images/SINE1.jpg
For the other example you should get
alt text http://home.comcast.net/~kootsoop/images/SINE2.jpg
So your answers both appear to be correct regarding the peak location.
What I'm not getting is the large DC component. Are you sure you are generating a sine wave as the input? Does the input go negative? For a sinewave, the DC should be close to zero provided you get enough cycles.
Another avenue is to craft a Goertzel's Algorithm of each note center frequency you are looking for.
Once you get one implementation of the algorithm working you can make it such that it takes parameters to set it's center frequency. With that you could easily run 88 of them or what ever you need in a collection and scan for the peak value.
The Goertzel Algorithm is basically a single bin FFT. Using this method you can place your bins logarithmically as musical notes naturally go.
Some pseudo code from Wikipedia:
s_prev = 0
s_prev2 = 0
coeff = 2*cos(2*PI*normalized_frequency);
for each sample, x[n],
s = x[n] + coeff*s_prev - s_prev2;
s_prev2 = s_prev;
s_prev = s;
end
power = s_prev2*s_prev2 + s_prev*s_prev - coeff*s_prev2*s_prev;
The two variables representing the previous two samples are maintained for the next iteration. This can be then used in a streaming application. I thinks perhaps the power calculation should be inside the loop as well. (However it is not depicted as such in the Wiki article.)
In the tone detection case there would be 88 different coeficients, 88 pairs of previous samples and would result in 88 power output samples indicating the relative level in that frequency bin.
WaveyDavey says that he's capturing sound from a mic, thru the audio hardware of his computer, BUT that his results are not zero-centered. This sounds like a problem with the hardware. It SHOULD BE zero-centered.
When the room is quiet, the stream of values coming from the sound API should be very close to 0 amplitude, with slight +- variations for ambient noise. If a vibratory sound is present in the room (e.g. a piano, a flute, a voice) the data stream should show a fundamentally sinusoidal-based wave that goes both positive and negative, and averages near zero. If this is not the case, the system has some funk going on!
-Rick