AWK - Select lines to print according to score - awk

I have a tab-separated file containing a series of lemmas with associated scores.
The file contains 5 columns, the first column is the lemma and the third is the one that contains the score. What I need to do is print the line as it is, when lemma is not repeated and print the line with the highest score when lemma is repeated.
IN
Lemma --- Score --- ---
cserép 06a 55 6 bueno
darázs 05 38 1 bueno
dél 06a 34 1 bueno
dér 06a 29 1 bueno
díj 05 14 89 malo
díj 06a 2 101 malo
díj 06b 2 101 malo
díj 07 90 13 bueno
díj 08a 2 101 malo
díj 08b 2 101 malo
egér 06a 66 5 bueno
fonal 05 12 1 bueno
fonal 07 52 4 bueno
Desired output
Lemma --- Score --- ---
cserép 06a 55 6 bueno
darázs 05 38 1 bueno
dél 06a 34 1 bueno
dér 06a 29 1 bueno
díj 07 90 13 bueno
egér 06a 66 5 bueno
fonal 07 52 4 malo
What I have done. But it only works when the lemma is repeated once.
BEGIN {
OFS=FS="\t";
flag="";
}
{
id=$1;
if (id != flag)
{
if (line != "")
{
sub("^;","",line);
z=split(line,A,";");
if ((A[3] > A[8]) && (A[8] != ""))
{
print A[1]"\t"A[2]"\t"A[3]"\t"A[4]"\t"A[5];
}
else if ((A[8] > A[3]) && (A[8] != ""))
{
print A[6]"\t"A[7]"\t"A[8]"\t"A[9]"\t"A[10]
}
else
{
print A[1]"\t"A[2]"\t"A[3]"\t"A[4]"\t"A[5];
}
}
delete line;
flag=id;
}
line[$1]=line[$1]";"$2";"$3";"$4";"$5;
}
END {
line=line ";"$1";"$2";"$3";"$4";"$5
sub("^;","",line);
z=split(line,A,";");
if ((A[3] > A[8]) && (A[8] != ""))
{
print A[1]"\t"A[2]"\t"A[3]"\t"A[4]"\t"A[5];
}
else if ((A[8] > A[3]) && (A[8] != ""))
{
print A[6]"\t"A[7]"\t"A[8]"\t"A[9]"\t"A[10]
}
else
{
print A[1]"\t"A[2]"\t"A[3]"\t"A[4]"\t"A[5]
}
}

This one doesn't require the file to be sorted by lemma, but, it keeps all the lines to be printed in memory (one for each lemma) so may not be appropriate for a file with millions of different lemmas.
It also does not respect the order of the original file.
Finally, it assumes that all scores are non-negative!
$ cat lemma.awk
BEGIN { FS = OFS = "\t" }
NR == 1 { print }
NR > 1 {
if ($3 > score[$1]) {
score[$1] = $3
line[$1] = $0
}
}
END { for (lemma in line) print line[lemma] }
$ awk -f lemma.awk lemma.txt
Lemma --- Score --- ---
cserép 06a 55 6 bueno
díj 07 90 13 bueno
fonal 07 52 4 bueno
darázs 05 38 1 bueno
egér 06a 66 5 bueno
dél 06a 34 1 bueno
dér 06a 29 1 bueno

Tested with gnu awk:
prevLemma != $1 {
if( prevLemma ) {
print line;
}
prevLemma = $1;
prevScore = $3;
line = $0;
}
prevLemma == $1 { if( prevScore < $3 ) {
prevScore = $3;
line = $0;
}
}
END { print line;}
assumption is: the file is sorted by lemma
when the lemma changes (or at the very beginning when the var is empty) the lemma, score and line are saved
when the lemma changes (or in the END), the line for the previous lemma is printed
when the current line belongs to the same lemma and has a higher score the values are saved again

$ cat tst.awk
$1 != prev { printf "%s", maxLine; maxLine=""; max=$3; prev=$1 }
$3 >= max { max=$3; maxLine=$0 ORS }
END { printf "%s", maxLine }
$ awk -f tst.awk file
Lemma --- Score --- ---
cserép 06a 55 6 bueno
darázs 05 38 1 bueno
dél 06a 34 1 bueno
dér 06a 29 1 bueno
díj 07 90 13 bueno
egér 06a 66 5 bueno
fonal 07 52 4 bueno

Use a script:
if ($1 != $5) print $0
else
{
score($NR) = $3
print $0
}
Actually , this might be better done with perl.

Related

Counting max and min per row across columns and outputting associated column names

I'm trying to count both max and min (except 0s) per row across columns and outputting associated column names.
I'm trying this:
BEGIN{OFS="\t"}
NR==1{print $1,$2,"ref","max","ref","min";
for(i=3;i<=6;++i)BASES[i]=$(i);
}
NR>1{l=1;basemax=BASES[3];basemin=BASES[3]; max=$3; min=$3;
for(i=4;i<=6;++i){
if($i>max){basemax=BASES[i];max=$i;}
else if($i==max){basemax=basemax","BASES[i];++l}
}
for(i=4;i<=6;++i){
if($i<min && $i !=0){basemmin=BASES[i];mim=$i}
else if($i==min){basemin=basemin","BASES[i];++l}
}
print $1,$2,basemax,max,basemin,min
}
In a input that looks like this
chr pos C T A G
NC_044998.1 3732 22 0 7 0
NC_044998.1 3733 22 0 0 0
NC_044998.1 3734 22 3 3 0
NC_044998.1 3735 22 0 0 3
NC_044998.1 3736 0 7 22 3
NC_044998.1 3737 0 0 0 25
NC_044998.1 3738 22 7 0 0
NC_044998.1 3739 7 3 22 25
NC_044998.1 3740 0 22 22 0
NC_044998.1 3741 22 0 0 0
The desired output is
chr pos ref max ref min
NC_044998.1 3732 C 22 A 7
NC_044998.1 3733 C 22 C 22
NC_044998.1 3734 C 22 T,A 3
NC_044998.1 3735 C 22 G 3
NC_044998.1 3736 A 22 G 3
NC_044998.1 3737 G 25 G 25
NC_044998.1 3738 C 22 C 22
NC_044998.1 3739 G 25 C 7
NC_044998.1 3740 T,A 22 T,A 22
NC_044998.1 3741 C 22 C 22
But it outputs this instead
chr pos ref max ref min
NC_044998.1 3732 C 22 C 22
NC_044998.1 3733 C 22 C 22
NC_044998.1 3734 C 22 C 22
NC_044998.1 3735 C 22 C 22
NC_044998.1 3736 A 22 C 0
NC_044998.1 3737 G 25 C,T,A 0
NC_044998.1 3738 C 22 C 22
NC_044998.1 3739 G 25 C 7
NC_044998.1 3740 T 22 C,A,G 0
NC_044998.1 3741 C 22 C 22
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v startField="3" -v endField="6" '
BEGIN{ OFS="\t"; print "chr pos ref max ref min"}
FNR==1{
for(i=startField;i<=endField;i++){
heading[i]=$i
}
next
}
{
min=max2=maxInd2=minInd=max=maxInd=minAllInd=maxAllInd=maxAllInd2=""
for(i=startField;i<=endField;i++){
if($i!=0){
minInd=(min>$i?i:(min==$i?minInd","i:(minInd!=""?minInd:i)))
min=(min>$i?$i:(min!=""?min:$i))
}
maxInd=(max<$i?i:(max==$i?maxInd","i:(maxInd!=""?maxInd:i)))
max=(max<$i?$i:(max!=""?max:$i))
}
for(i=startField+1;i<=endField;i++){
maxInd2=(max2<$i?i:(max2==$i?maxInd2","i:(maxInd2!=""?maxInd2:i)))
max2=(max2<$i?$i:(max2!=""?max2:$i))
}
num1=split(maxInd,arr1,",")
num2=split(minInd,arr2,",")
num3=split(maxInd2,arr3,",")
if(num1>1){
for(k=1;k<=num1;k++){
maxAllInd = (maxAllInd?maxAllInd ",":"") heading[arr1[k]]
}
}
else{
maxAllInd = heading[maxInd]
}
if(num2>1){
for(k=1;k<=num2;k++){
minAllInd = (minAllInd?minAllInd ",":"") heading[arr2[k]]
}
}
else{
minAllInd = heading[minInd]
}
if(num3>1){
for(k=1;k<=num3;k++){
maxAllInd2 = (maxAllInd2?maxAllInd2 ",":"") heading[arr3[k]]
}
}
else{
maxAllInd2 = heading[maxInd2]
}
if(startField>1){
NF=(startField-1)
if(min !=0 ){
print $0,maxAllInd,max,minAllInd,min
}
if(min == 0 && max2 != 0){
print $0,maxAllInd,max,maxAllInd2,max2
}
if(min == 0 && max2 == 0){
print $0,maxAllInd,max,maxAllInd,max
}
}
else{
if(min !=0 ){
print maxAllInd,max,minAllInd,min
}
if(min == 0 && max2 != 0){
print maxAllInd,max,maxAllInd2,max2
}
if(min == 0 && max2 == 0){
print maxAllInd,max,maxAllInd,max
}
}
}
' Input_file
This awk script should work for you:
cat maxmin.awk
NR == 1 {
for (i=b; i<=NF; ++i)
hdr[i] = $i
print $1, $2, "ref", "max", "ref", "min"
next
}
{
for (i=b; i<=NF; ++i) {
max = ($i > max ? $i : max)
min = ($i && (min == "" || $i < min) ? $i : min)
}
for (i=b; i<=NF; ++i) {
if ($i == min)
rmin = (rmin ? rmin "," : "") hdr[i]
if ($i == max)
rmax = (rmax ? rmax "," : "") hdr[i]
}
print $1, $2, rmax, max, rmin, min
max = min = rmax = rmin = ""
}
And use it as:
awk -v b=3 -f maxmin.awk gg | column -t
chr pos ref max ref min
NC_044998.1 3732 C 22 A 7
NC_044998.1 3733 C 22 C 22
NC_044998.1 3734 C 22 T,A 3
NC_044998.1 3735 C 22 G 3
NC_044998.1 3736 A 22 G 3
NC_044998.1 3737 G 25 G 25
NC_044998.1 3738 C 22 T 7
NC_044998.1 3739 G 25 T 3
NC_044998.1 3740 T,A 22 T,A 22
NC_044998.1 3741 C 22 C 22
column -t has been used for tabular output only.
You have some typos in variable names such as basemmin and mim.
If the count of C is 0, the min value has no chance to be updated.
You can combine the two for loops into one.
The variable l is not used.
Then would you please try the following:
awk -v OFS="\t" '
NR==1 {
print $1, $2, "ref", "max", "ref", "min"
for (i = 3; i <= 6; i++) bases[i] = $i
}
NR>1 {
basemax = bases[3]; basemin = bases[3]; max = $3; min = $3
for (i = 4; i <= 6; i++) {
if ($i > max) {basemax = bases[i]; max = $i}
else if ($i == max) {basemax = basemax "," bases[i]}
if ($i < min && $i != 0 || min == 0) {basemin = bases[i]; min = $i}
else if ($i == min) {basemin = basemin "," bases[i]}
}
print $1, $2, basemax, max, basemin, min
}' input_file
Output:
chr pos ref max ref min
NC_044998.1 3732 C 22 A 7
NC_044998.1 3733 C 22 C 22
NC_044998.1 3734 C 22 T,A 3
NC_044998.1 3735 C 22 G 3
NC_044998.1 3736 A 22 G 3
NC_044998.1 3737 G 25 G 25
NC_044998.1 3738 C 22 T 7
NC_044998.1 3739 G 25 T 3
NC_044998.1 3740 T,A 22 T,A 22
NC_044998.1 3741 C 22 C 22
Please note the output slightly differs from your desired output, which may contain typos.

For each unique occurrence in field, transform each unique occurrence in another field in a different column

I have a file
splice_region_variant,intron_variant A1CF 1
3_prime_UTR_variant A1CF 18
intron_variant A1CF 204
downstream_gene_variant A1CF 22
synonymous_variant A1CF 6
missense_variant A1CF 8
5_prime_UTR_variant A2M 1
stop_gained A2M 1
missense_variant A2M 15
splice_region_variant,intron_variant A2M 2
synonymous_variant A2M 2
upstream_gene_variant A2M 22
intron_variant A2M 308
missense_variant A4GNT 1
intron_variant A4GNT 21
5_prime_UTR_variant A4GNT 3
3_prime_UTR_variant A4GNT 7
This file is sorted by $2
for each occurrence of an unique element in $2, I wanna transform in a column each unique occurrence of an element in $1, with corresponding value in $3, or 0 if the record is not there. So that I have:
splice_region_variant,intron_variant 3_prime_UTR_variant intron_variant downstream_gene_variant synonymous_variant missense_variant 5_prime_UTR_variant stop_gained upstream_gene_variant
A1CF 1 18 204 22 6 8 0 0 0
A2M 2 0 308 0 2 15 1 1 22
A4GNT 0 7 21 0 0 22 3 0 0
test file:
a x 2
b,c x 4
dd x 3
e,e,t x 5
a b 1
cc b 2
e,e,t b 1
This is what I'm getting:
a b,c dd e,e,t cc
x 5 2 4 3
b 1 2 1
EDIT: This might be doing it but doesn't output 0s in blank fields
'BEGIN {FS = OFS = "\t"}
NR > 1 {data[$2][$1] = $3; blocks[$1]}
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
# header
printf "gene"
for (block in blocks) {
printf "%s%s", OFS, block
}
print ""
# data
for (ts in data) {
printf "%s", ts
for (block in blocks) {
printf "%s%s", OFS, data[ts][block]
}
print ""
}
}' file
modified from https://unix.stackexchange.com/questions/424642/dynamic-transposing-rows-to-columns-using-awk-based-on-row-value
If you want to print 0 if a certain value is absent, you could do something like this:
val = data[ts][block] ? data[ts][block] : 0;
printf "%s%s", OFS, val

Print sorted output with awk to avoid pipe sort command

I'm trying to match the lines containing (123) and then manipulate field 2 replacing x and + by space that will give 4 columns. Then change order of column 3 by Column 4.
To finally print sorted first by column 3 and second by column 4.
I'm able to get the output piping sort command after awk output in this way.
$ echo "
0: 1920x1663+0+0 kpwr(746)
323: 892x550+71+955 kpwr(746)
211: 891x550+1003+410 kpwr(746)
210: 892x451+71+410 kpwr(746)
415: 891x451+1003+1054 kpwr(746)
1: 894x532+70+330 kpwr(123)
324: 894x532+1001+975 kpwr(123)
2: 894x631+1001+330 kpwr(123)
212: 894x631+70+876 kpwr(123)
61: 892x1+71+375 kpwr(0)
252: 892x1+71+921 kpwr(0)" |
awk '/\(123\)/{b = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2); print b}' |
sort -k3 -k4 -n
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
How can I get the same output using only awk without the need to pipe sort? Thanks for any help.
Here is how you can get it from awk (gnu) itself:
awk '/\(123\)/{
$2 = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2)
split($2, a) # split by space and store into array a
# store array by index 3 and 4
rec[a[3]][a[4]] = (rec[a[3]][a[4]] == "" ? "" : rec[a[3]][a[4]] ORS) $2
}
END {
PROCINFO["sorted_in"]="#ind_num_asc" # sort by numeric key ascending
for (i in rec) # print stored array rec
for (j in rec[i])
print rec[i][j]
}' file
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
Can you handle GNU awk?:
$ gawk '
BEGIN {
PROCINFO["sorted_in"]="#val_num_asc" # for order strategy
}
/\(123\)$/ { # pick records
split($2,t,/[+x]/) # split 2nd field
if((t[4] in a) && (t[3] in a[t[4]])) { # if index collision
n=split(a[t[4]][t[3]],u,ORS) # split stacked element
u[n+1]=t[1] OFS t[2] OFS t[4] OFS t[3] # add new data
delete a[t[4]][t[3]] # del before rebuilding
for(i in u) # sort on whole record
a[t[4]][t[3]]=a[t[4]][t[3]] ORS u[i] # restack to element
} else
a[t[4]][t[3]]=t[1] OFS t[2] OFS t[4] OFS t[3] # no collision, just add
}
END {
PROCINFO["sorted_in"]="#ind_num_asc" # strategy on output
for(i in a)
for(j in a[i])
print a[i][j]
}' file
Output:
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
With collisioning data like:
1: 894x532+70+330 kpwr(123) # this
1: 123x456+70+330 kpwr(123) # and this, notice order
324: 894x532+1001+975 kpwr(123)
2: 894x631+1001+330 kpwr(123)
212: 894x631+70+876 kpwr(123)
output would be:
123 456 330 70 # ordered by the whole record when collision
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
I was almost done with writing and my solution was ditto as #anubhava's so adding a bit tweak to his solution :) This one will take care of multiple lines of same values here.
awk '
BEGIN{
PROCINFO["sorted_in"]="#ind_num_asc"
}
/\(123\)/{
$2 = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2)
split($2, a," ")
arr[a[3]][a[4]] = (arr[a[3]][a[4]]!=""?arr[a[3]][a[4]] ORS:"")$2
}
END {
for (i in arr){
for (j in arr[i]){ print arr[i][j] }
}
}' Input_file

How would I print the elements matching a regex pattern in order?

I have a text file that has text in this format:
ptr[0] = Alloc(1) returned 1000 (searched 1 elements)
Free List [ Size 1 ]: [ addr:1001 sz:99 ]
Free(ptr[0]) returned 0
Free List [ Size 2 ]: [ addr:1000 sz:1 ] [ addr:1001 sz:99 ]
ptr[1] = Alloc(7) returned 1001 (searched 2 elements)
Free List [ Size 2 ]: [ addr:1000 sz:1 ] [ addr:1008 sz:92 ]
Free(ptr[1]) returned 0
Free List [ Size 3 ]: [ addr:1000 sz:1 ] [ addr:1001 sz:7 ] [ addr:1008 sz:92 ]
ptr[2] = Alloc(5) returned 1001 (searched 3 elements)
Free List [ Size 3 ]: [ addr:1000 sz:1 ] [ addr:1006 sz:2 ] [ addr:1008 sz:92 ]
Free(ptr[2]) returned 0
Free List [ Size 5 ]: [ addr:1000 sz:1 ] [ addr:1001 sz:5 ] [ addr:1006 sz:2 ] [ addr:1008 sz:8 ] [ addr:1016 sz:84 ]
And I am trying to print out only the values that match with the sz: in the text file and print them in the order they are in but as a list. Like so:
$ awk -f list.awk file.txt | head
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84
I've tried the following, but it prints out only the lines that contain sz:. How could I break it further to get the output I want?
/Free List/{
s = $0
split(s, a, /sz:/)
print s
}
Following awk solutions may help you on same.
Solution 1st: When you want only digit value associated with string sz then following may help you on same.
awk '{while(match($0,/sz:[0-9]+/)){val=(val?val FS:"") substr($0,RSTART+3,RLENGTH-3);$0=substr($0,RSTART+RLENGTH)}}val!=""{print val;val=""}' Input_file
Adding a non-one liner form of solution too now.
awk '
{
while(match($0,/sz:[0-9]+/)){
val=(val?val FS:"") substr($0,RSTART+3,RLENGTH-3);
$0=substr($0,RSTART+RLENGTH)}
}
val!=""{
print val;
val=""
}
' Input_file
Solution 2nd: In case you need to have string sz also with values then following may help you on same.
awk '{while(match($0,/sz:[0-9]+/)){val=(val?val FS:"") substr($0,RSTART+3,RLENGTH-3);$0=substr($0,RSTART+RLENGTH)}}val!=""{print val;val=""}' Input_file
Adding a non one liner form of solution too now.
awk '
{
while(match($0,/sz:[0-9]+/)){
val=(val?val FS:"") substr($0,RSTART+3,RLENGTH-3);
$0=substr($0,RSTART+RLENGTH)}
}
val!=""{
print val;
val=""
}
' Input_file
NOTE: In case you want to perform this operation only on those lines which havd string Free List then add /Free List/{ before while and add } before ' in above solutions simply.
if perl is okay:
$ perl -lne 'print join " ", /sz:(\d+)/g if /Free List/' ip.txt
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84
if /Free List/ if line contains Free List
/sz:(\d+)/g match all digits that follows sz:
print join " " print those matches separated by space
see https://perldoc.perl.org/perlrun.html#Command-Switches for details on -lne options
Using bash and grep:
while IFS= read -r line; do
x=$(grep -oP '(sz:\K\d+)' <<< "$line")
[[ $x ]] && echo $x
done < file
Output :
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84
With GNU awk for FPAT:
$ awk -v FPAT='sz:[0-9]+' '{for (i=1;i<=NF;i++) printf "%s%s", substr($i,4), (i<NF?OFS:ORS)}' file
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84
With any awk:
$ awk '{out=""; while (match($0,/sz:[0-9]+/)) { out = (out=="" ? "" : out OFS) substr($0,RSTART+3,RLENGTH-3); $0=substr($0,RSTART+RLENGTH) } $0=out } NF' file
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84

H264 encoding and decoding using Videotoolbox

I was testing the encoding and decoding using videotoolbox, to convert the captured frames to H264 and using that data to display it in AVSampleBufferdisplayLayer.
error here while decompress CMVideoFormatDescriptionCreateFromH264ParameterSets with error code -12712
I follow this code from mobisoftinfotech.com
status = CMVideoFormatDescriptionCreateFromH264ParameterSets(
kCFAlloc‌​‌ atorDefault, 2,
(const uint8_t const)parameterSetPointers,
parameterSetSizes, 4, &_formatDesc);
videoCompressionTest; can anyone figure out the problem?
I am not sure if you did figure out the problem yet. However, I found 2 places in your code that leading to the error. After fixed them and run locally your test app, it seems to be working fine. (Tested with Xcode 9.4.1, MacOS 10.13)
The first one is in -(void)CompressAndConvertToData:(CMSampleBufferRef)sampleBuffer method where the while loop should be like this
while (bufferOffset < blockBufferLength - AVCCHeaderLength) {
// Read the NAL unit length
uint32_t NALUnitLength = 0;
memcpy(&NALUnitLength, bufferDataPointer + bufferOffset, AVCCHeaderLength);
// Convert the length value from Big-endian to Little-endian
NALUnitLength = CFSwapInt32BigToHost(NALUnitLength);
// Write start code to the elementary stream
[elementaryStream appendBytes:startCode length:startCodeLength];
// Write the NAL unit without the AVCC length header to the elementary stream
[elementaryStream appendBytes:bufferDataPointer + bufferOffset + AVCCHeaderLength
length:NALUnitLength];
// Move to the next NAL unit in the block buffer
bufferOffset += AVCCHeaderLength + NALUnitLength;
}
uint8_t *bytes = (uint8_t*)[elementaryStream bytes];
int size = (int)[elementaryStream length];
[self receivedRawVideoFrame:bytes withSize:size];
The second place is the decompression code where you process for NALU type 8, the block of code in if(nalu_type == 8) statement. This is a tricky one.
To fix it, update
for (int i = _spsSize + 12; i < _spsSize + 50; i++)
to
for (int i = _spsSize + 12; i < _spsSize + 12 + 50; i++)
And you are freely to remove this hack
//was crashing here
if(_ppsSize == 0)
_ppsSize = 4;
Why? Lets print out the frame packet format.
po frame
▿ 4282 elements
- 0 : 0
- 1 : 0
- 2 : 0
- 3 : 1
- 4 : 39
- 5 : 100
- 6 : 0
- 7 : 30
- 8 : 172
- 9 : 86
- 10 : 193
- 11 : 112
- 12 : 247
- 13 : 151
- 14 : 64
- 15 : 0
- 16 : 0
- 17 : 0
- 18 : 1
- 19 : 40
- 20 : 238
- 21 : 60
- 22 : 176
- 23 : 0
- 24 : 0
- 25 : 0
- 26 : 1
- 27 : 6
- 28 : 5
- 29 : 35
- 30 : 71
- 31 : 86
- 32 : 74
- 33 : 220
- 34 : 92
- 35 : 76
- 36 : 67
- 37 : 63
- 38 : 148
- 39 : 239
- 40 : 197
- 41 : 17
- 42 : 60
- 43 : 209
- 44 : 67
- 45 : 168
- 46 : 0
- 47 : 0
- 48 : 3
- 49 : 0
- 50 : 0
- 51 : 3
- 52 : 0
- 53 : 2
- 54 : 143
- 55 : 92
- 56 : 40
- 57 : 1
- 58 : 221
- 59 : 204
- 60 : 204
- 61 : 221
- 62 : 2
- 63 : 0
- 64 : 76
- 65 : 75
- 66 : 64
- 67 : 128
- 68 : 0
- 69 : 0
- 70 : 0
- 71 : 1
- 72 : 37
- 73 : 184
- 74 : 32
- 75 : 1
- 76 : 223
- 77 : 205
- 78 : 248
- 79 : 30
- 80 : 231
… more
The first NALU start code if (nalu_type == 7) is 0, 0, 0, 1 from index of 15 to 18. The next 0, 0, 0, 1 (from 23 to 26) is type 6, type 8 NALU start code is from 68 to 71. That why I modify the for loop a bit to scan from start index (_spsSize + 12) with a range of 50.
I haven't fully tested your code to make sure encode and decode work properly as expected. However, I hope this finding would help you.
By the way, if there is any misunderstanding, I would love to learn from your comments.