Introducing packet/slice loss on hevc bitstream - hevc

I'm doing a study on HEVC and very new to video compression. The first thing i want to do is see the effect of packet loss on the decoded video. I want to modify the decoder so that i can introduce packet loss on the hevc bitstream. I'll be using error patterns generated by NS2. What part of the decoder should i focus on? How can i insert the error patterns to the hevc bitstream? What specific variables determine the frame and slice number? I'm using HM 16.6. Thanks

I once developed a Python tool that hacks into the bitstream and flip bits around. What I did is that I read the bitstream file generated by the encoder linearly, and randomized the bit flipping process. Because I know the structure of the NAL units, from the standard specs, I could tell where my corrupted bit is. The best part to start manipulating is the NAL unit headers. The video, sequence, slice headers. You can tell where they are based on the packetization process parameters. It has been a long time, so I forgot the details. Headers do not tell everything about the bitstream especially the frame, slice number. It could be lice-based or tile-based or I forgot what else. But that you can tell from the headers. Now the decoder will know the frame, slice number as it decodes and follow the encoder's guidelines. So what you will need to know will be in the headers. Check the latest working draft and study the header information.

I am posting this as an answer rather than a comment since I dont have reputations to comment.
In HEVC a slice is a group of consecutive CTU (Coding Tree Unit)s within a frame. Furthermore, HEVC introduces splitting of slices into so-called slice segments, mainly for low-delay appliances . When a slice is split into multiple slice segments, only the first slice segment carries the header information required to decode the whole slice. The remaining dependant slice segments refer to the initial slice segment during the decoding. Each slice segment (or slice if no segmentation is used) maps to a single data unit called a ‘NAL (Network Abstraction Layer) unit’. A NAL unit is the video packet payload in the physical channel, and therefore a packet loss during the transmission corresponds to a NAL unit loss and consequently a slice loss.
If you study the HEVC standard you will encounter another type of data unit called an Access unit. An access unit is a collection of NAL units(hence a collection of slices). During the transmission delimiters are placed between NAL units and also between Access units in order to separately identify each type. In a HEVC coded stream an access unit delimiter is defined as x00 00 00 01 whereas a NAL unit delimiter is defined as x00 00 01. So basically if you want to introduce a packet loss in a given frame, first you need to identify the correct frame by counting the access unit delimiters in the bitstream. Later delete all the bits between the desired NAL unit delimiters.

Related

extracting motion-compensated frames during HEVC encoding

I am trying to analyze H.265 coding performance. Is there a way to export the predicted frames for H.265/HEVC encoding? Specifically, how should I obtain reconstructed frames after compensating with the motion vectors, but before applying the residual? Is there a way to do this with ffmpeg, or any other codec analysis tool?
Yes you can do it with HM decoder.
What you need to do is to find the exact line of the code in the TDecCu.cpp file, where two pointers piResi and piPred are accessed to be added and reconstruct the block. There, you may print piPred alone.

Manchester decoding with variable size frames

I'm attempting to decode a manchester-encoded packet using GNU Radio Companion. I've been following this example where the author decodes packets from a Somfy window blinds remote. From what I've read in that article and this mailing list, the Viterbi Combo block is the way to perform manchester decoding.
The Viterbi Combo block requires the block size (frame size) beforehand. This isn't a problem in the case of the first article because the frame size is fixed and known beforehand. In my case, however, the frame size is variable and conveyed in the first octet of the header. i.e.
[preamble][sync][header][data][crc]
There are several blocks for extracting length information out of a stream, but these assume that the manchester decoding has already been done. Is there a way to do manchester decoding without knowing the block size? Is this a case where I need to make my own custom block?

QR code-like alternative with extremely low error rate and ability to read bent codes

I'm trying to find an alternative to QR codes (I'd also be willing to accept an entirely novel solution and implement it myself) that meets certain specifications.
First, the codes will often end up on thin pipes, and so need to be readable around a cylinder. The advantage to this is that the effect on the image from wrapping it around a cylinder is easy to express geometrically, and the codes will never be placed on a very irregular shape.
Second, read accuracy must be very high, as any read mistake would be extremely costly. If this means larger codes with more redundancy for better error correction, so be it.
Third, ability to be read by the average smartphone camera from a few inches out.
Fourth, storage space of around half a kilobyte per code.
Do you know of such a code?
The Data Matrix Rectangular Extension (DMRE) improves upon the standard set of rectangular Data Matrix symbol sizes in an algorithmically compatible manner, thus increasing the range of suitable applications with no real downsides.
Reliable cylindrical marking is a primary use case.
Regardless of symbology you will be unable to approach sufficient data density to achieve 0.5KB of binary data in a single compact, narrow symbol scanned using a standard camera phone. However, most 2D symbologies (DMRE included) support a feature called Structured Append that allows chaining of multiple symbols that can be scanned in any order to produce a single read when all components are accounted for.
If the data to be encoded is known to be highly structured (e.g. mostly numeric or alphanumeric) then the internal encoding process of Data Matrix will be more optimised than for general binary data. For example, the largest DMRE symbol (26×64) will provide up to 236 numeric characters, ~175 alphanumeric characters and only 116 bytes.
If the default error recovery rate is insufficient then including a checksum in the data may be appropriate.
DMRE has just been voted to be accepted as an ISO/IEC project and will likely become an international standard enjoying broad hardware and software support in due course.
Another option may be to investigate PDF417 which has a broader range of symbols sizes, however the data density is somewhat less than Data Matrix.
DMRE references: AIM specification and explanatory notes.

Automatic feature extraction from chess board positions

I am working on a project where I take a chess board position (FEN string converted to binary) & it's evaluation score and feed it to a neural network. My aim is to make the neural network differentiate between good and bad positions.
How I encode the position : There are 12 unique pieces in chess i.e pawn, rook, knight, bishop, queen and king for white as well as black. I encode each piece using 4 bits with 0000 denoting an empty square. So the 64 squares are encoded into 256 bits and I use 6 more bits to denote game state like whose turn it is to move, king-castle status, etc.
Problem : Since the input space for chess positions is neither smooth nor uni-modal (one small change in the board position can result in a huge change in the evaluation score), the neural network doesn't learn well. Now, the next logical thing to somehow extract useful features (like material difference, center control, etc) and feed it to the network.
I do not want to hand pick the features as I want the network to learn everything by itself. Therefore I am thinking of extracting features automatically using autoencoders. Is there any better way to accomplish this?
Summary : What is the best way to automatically extract features from a chess board position so that it can be fed into a neural network?
UPDATE : To generate training data, I have modified Stockfish to dump it's evaluation process into a log file. So every new move(position) it considers is written to a file as an FEN string along with it's eval score
Neural networks can give an approximation of any function. The only consideration to do is the dimensionality of the search space, which give constraints to the amount of data you have to get a good approximation.
For a supervised network (you use autoencoders, then I think you use some variant of backpropagation), it's difficult for me to immagine how you think to do the trainig using single positions because you need similar positions in your training set. Maybe your approach is different, but I'm convinced that second strategy (using features) is more promising. I think using positions require a huge amount of data training to get good results.
For features take a look here, and to the classical work of Shannon.
I taked also useful informations from the source code of Crafty.
But you have to extract these informations from the FEN string.
Autoencoders are a way to give a reduction of data (good because increase performances). It seems to be better the use of Pincipal Component Analysys, as reported here.
I hope this can help you.

How to detect silence and cut mp3 file without re-encoding using NAudio and .NET

I've been looking for an answer everywhere and I was only able to find some bits and pieces. What I want to do is to load multiple mp3 files (kind of temporarily merge them) and then cut them into pieces using silence detection.
My understanding is that I can use Mp3FileReader for this but the questions are:
1. How do I read say 20 seconds of audio from an mp3 file? Do I need to read 20 times reader.WaveFormat.AverageBytesPerSecond? Or maybe keep on reading frames until the sum of Mp3Frame.SampleCount / Mp3Frame.SampleRate exceeds 20 seconds?
2. How do I actually detect the silence? I would look at an appropriate number of the consecutive samples to check if they are all below some threshold. But how do I access the samples regardless of them being 8 or 16bit, mono or stereo etc.? Can I directly decode an MP3 frame?
3. After I have detected silence at say sample 10465, how do I map it back to the mp3 frame index to perform the cutting without re-encoding?
Here's the approach I'd recommend (which does involve re-encoding)
Use AudioFileReader to get your MP3 as floating point samples directly in the Read method
Find an open source noise gate algorithm, port it to C#, and use that to detect silence (i.e. when noise gate is closed, you have silence. You'll want to tweak threshold and attack/release times)
Create a derived ISampleProvider that uses the noise gate, and in its Read method, does not return samples that are in silence
Either: Pass the output into WaveFileWriter to create a WAV File and and encode the WAV file to MP3
Or: use NAudio.Lame to encode directly without a WAV step. You'll probably need to go from SampleProvider back down to 16 bit WAV provider first
BEFORE READING BELOW: Mark's answer is far easier to implement, and you'll almost certainly be happy with the results. This answer is for those who are willing to spend an inordinate amount of time on it.
So with that said, cutting an MP3 file based on silence without re-encoding or full decoding is actually possible... Basically, you can look at each frame's side info and each granule's gain & huffman data to "estimate" the silence.
Find the silence
Copy all the frames from before the silence to a new file
now it gets tricky...
Pull the audio data from the frames after the silence, keeping track of which frame header goes with what audio data.
Start writing the second new file, but as you write out the frames, update the main_data_begin field so the bit reservoir is in sync with where the audio data really is.
MP3 is a compressed audio format. You can't just cut bits out and expect the remainder to still be a valid MP3 file. In fact, since it's a DCT-based transform, the bits are in the frequency domain instead of the time domain. There simply are no bits for sample 10465. There's a frame which contains sample 10465, and there's a set of bits describing all frequencies in that frame.
Plain cutting the audio at sample 10465 and continuing with some random other sample probably causes a discontinuity, which means the number of frequencies present in the resulting frame skyrockets. So that definitely means a full recode. The better way is to smooth the transition, but that's not a trivial operation. And the result is of course slightly different than the input, so it still means a recode.
I don't understand why you'd want to read 20 seconds of audio anyway. Where's that number coming from? You usually want to read everything.
Sound is a wave; it's entirely expected that it crosses zero. So being close to zero isn't special. For a 20 Hz wave (threshold of hearing), zero crossings happen 40 times per second, but each time you'll have multiple samples near zero. So you basically need multiple samples that are all close to zero, but on both sides. 5 6 7 isn't much for 16 bits sounds, but it might very well be part of a wave that will have a maximum at 10000. You really should check for at least 0.05 seconds to catch those 20 Hz sounds.
Since you detected silence in a 50 millisecond interval, you have a "position" that's approximately several hundred samples wide. With any bit of luck, there's a frame boundary in there. Cut there. Else it's time for reencoding.