awk: Search missing value in file - awk

awk newbie here! I am asking for help to solve a simple specific task.
Here is file.txt
1
2
3
5
6
7
8
9
As you can see a single number (the number 4) is missing. I would like to print on the console the number 4 that is missing. My idea was to compare the current line number with the entry and whenever they don't match I would print the line number and exit. I tried
cat file.txt | awk '{ if ($NR != $1) {print $NR; exit 1} }'
But it prints only a newline.
I am trying to learn awk via this small exercice. I am therefore mainly interested in solutions using awk. I also welcome an explanation for why my code does not do what I would expect.

Try this -
awk '{ if (NR != $1) {print NR; exit 1} }' file.txt
4

since you have a solution already, here is another approach, comparing with previous values.
awk '$1!=p+1{print p+1} {p=$1}' file
you positional comparison won't work if you have more than one missing value.

Maybe this will help:
seq $(tail -1 file)|diff - file|grep -Po '.*(?=d)'
4
Since I am learning awk as well
awk 'BEGIN{i=0}{i++;if(i!=$1){print i;i=$1}}' file
4
`awk` explanation read each number from `$1` into array `i` and increment that number list line by line with `i++`, if the number is not sequential, then print it.
cat file
1
2
3
5
6
7
8
9
11
12
13
15
awk 'BEGIN{i=0}{i++;if(i!=$1){print i;i=$1}}' file
4
10
14

Related

How to obtain the difference between 2 columns with awk in a tsv

I have a kind of the following tsv file:
Hi 10 6
hello 7 1
Hi 6 2
Hence, the related output should be:
Hi 10 6 4
hello 7 1 6
Hi 6 2 4
I'd like to create a 4th column with the difference between the 2 columns $2 and $3, but in an absolute value.
I am trying with the following line, but with no right result:
awk -F\t '{print $0 OFS $2-$3}' file
How can I do?
Marco
When you write -F\t without quotes you're inviting the shell to interpret the string and so the shell will read the backslash and leave you with -Ft. In some awks -Ft means "use a tab as the separator" while in others it means "use the character t as the separator". I assume since you say your file is a TSV that you want it to use tabs so just write your code to not invite the shell to interpret it, i.e. -F'\t' instead of -F\t.
After that to get the absolute value is obvious and well-covered in many postings:
$ awk 'BEGIN{FS=OFS="\t"} {print $0, ($2>$3 ? $2-$3 : $3-$2)}' file
Hi 10 6 4
hello 7 1 6
Hi 6 2 4
I changed from using -F'\t' to FS="\t" because both FS and OFS need to be set to "\t" and so setting them both together to the same character instead of separately avoids duplication.

Print every second consequtive field in two columns - awk

Assume the following file
#zvview.exe
#begin Present/3
77191.0000 189.320100 0 0 3 0111110 16 1
-8.072430+6-8.072430+6 77190 0 1 37111110 16 2
37 2 111110 16 3
8.115068+6 0.000000+0 8.500000+6 6.390560-2 9.000000+6 6.803440-1111110 16 4
9.500000+6 1.685009+0 1.000000+7 2.582780+0 1.050000+7 3.260540+0111110 16 5
37 2 111110 16 18
What I would like to do, is print in two columns, the fields after line 6. This can be done using NR. The tricky part is the following : Every second field, should go in one column as well as adding an E before the sign, so that the output file will look like this
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0
From the output file you see that I want to keep in $6 only length($6)=10 characters.
How is it possible to do it in awk?
can do all in awk but perhaps easier with the unix toolset
$ sed -n '6,7p' file | cut -c2-66 | tr ' ' '\n' | pr -2ats' '
8.115068+6 0.000000+0
8.500000+6 6.390560-2
9.000000+6 6.803440-1
9.500000+6 1.685009+0
1.000000+7 2.582780+0
1.050000+7 3.260540+0
Here is a awk only solution or comparison
$ awk 'NR>=6 && NR<=7{$6=substr($6,1,10);
for(i=1;i<=6;i+=2) {f[++c]=$i;s[c]=$(i+1)}}
END{for(i=1;i<=c;i++) print f[i],s[i]}' file
8.115068+6 0.000000+0
8.500000+6 6.390560-2
9.000000+6 6.803440-1
9.500000+6 1.685009+0
1.000000+7 2.582780+0
1.050000+7 3.260540+0
Perhaps shorter version,
$ awk 'NR>=6 && NR<=7{$6=substr($6,1,10);
for(i=1;i<=6;i+=2) print $i FS $(i+1)}' file
8.115068+6 0.000000+0
8.500000+6 6.390560-2
9.000000+6 6.803440-1
9.500000+6 1.685009+0
1.000000+7 2.582780+0
1.050000+7 3.260540+0
to convert format to standard scientific notation, you can pipe the result to
sed or embed something similar in awk script (using gsub).
... | sed 's/[+-]/E&/g'
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0
With GNU awk for FIELDWIDTHS:
$ cat tst.awk
BEGIN { FIELDWIDTHS="9 2 9 2 9 2 9 2 9 2 9 2" }
NR>5 && NR<8 {
for (i=1;i<NF;i+=4) {
print $i "E" $(i+1), $(i+2) "E" $(i+3)
}
}
$ awk -f tst.awk file
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0
If you really want to get rid of the leading blanks then there's various ways to do it (simplest being gsub(/ /,"",$<field number>) on the relevant fields) but I left them in because the above allows your output to line up properly if/when your numbers start with a -, like they do on line 4 of your sample input.
If you don't have GNU awk, get it as you're missing a LOT of extremely useful functionality.
I tried to combine #karafka 's answer using substr, so the following does the trick!
awk 'NR>=6 && NR<=7{$6=substr($6,1,10);for(i=1;i<=6;i+=2) print substr($i,1,8) "E" substr($i,9) FS substr($(i+1),1,8) "E" substr($(i+1),9)}' file
and the output is
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0

AWK: Divide any element of any row by some element of another row

I have got a text file with some structure like this:
2 2 4 5 6
1 9 7 6 2
1 5 2 8 5
I want to be able to divide any element of any row by an element of another row. For example if I wanted to divide the 3rd element of the 1st row by the 2nd element of the 3rd row that would give:
4/5 = 0.8
Couldn't figure out a smart way to do this with AWK. Suggestions?
This MAY be what you want but it's hard to tell without more details and the expected output:
$ awk -v num=1,5 -v den=3,3 '{for (i=1;i<=NF;i++) cell[NR","i]=$i} END{print (cell[den] ? cell[num]/cell[den] : "NaN")}' file
3
$ awk -v num=3,4 -v den=1,2 '{for (i=1;i<=NF;i++) cell[NR","i]=$i} END{print (cell[den] ? cell[num]/cell[den] : 0)}' file
4
If (i1, j1) and (i2, j2) are the coordinates of the numerator and the denominator, you can do this :
i1=1
j1=3
i2=3
j2=2
awk 'NR=='$i1'{a=$'$j1'} NR=='$i2' {b=$'$j2'} END {print a"/"b " = " a/b}' file

awk - skip last line for condition

When I wrote an answer for this question I used the following:
something | sed '$d' | awk '$1>3{print $0}'
e.g.
print only lines where the 1st field is bigger than 3 (awk)
but omit the last line sed '$d'.
This seems for me a bit of duplicate work, surely it is possible to do the above only with awk - without the sed?
I'm an awkdiot - so, can someone suggest a solution?
Here's one way you could do it:
$ printf "%s\n" {1..10} | awk 'NR>1&&p>3{print p}{p=$1}'
4
5
6
7
8
9
Basically, print the first field of the previous line, rather than the current one.
As Wintermute has rightly pointed out in the comments (thanks), in order to print the whole line, you can modify the code to this:
awk 'p { print p; p="" } $1 > 3 { p = $0 }'
This only assigns the contents of contents of the line to p if the first field is greater than 3.

Extracting block of data from a file

I have a problem, which surely can be solved with an awk one-liner.
I want to split an existing data file, which consists of blocks of data into separate files.
The datafile has the following form:
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
And i want to store every single block of data in a separate file, named - for example - "1.dat", ".dat", "3.dat",...
The problem is, that each block doesn't have a specific line number, they are just delimited by two "new lines".
Thanks in advance,
Jürgen
This should get you started:
awk '{ print > ++i ".dat" }' RS= file.txt
If by two "new lines" you mean, two newline characters:
awk '{ print > ++i ".dat" }' RS="\n\n" file.txt
See how the results differ? Setting a null RS (i.e. the first example) is probably what you're looking for.
Another approach:
awk 'NF != 0 {print > $1 ".dat"}' file.txt