Reading a file from line 4 to the end - awk

I want to read a file from the line 4 to the very end is there anyway to this with awk or something?

This sed command will do:
sed -n '4,$p' file.txt
Or using awk:
awk 'NR>=4' file.txt
Or using tail:
tail +4 file.txt

awk 'NR >= 4 {print $0}'
For example
$> seq 101 110 | awk 'NR >= 4 {print $0}'
104
105
106
107
108
109
110

tail +4 filename ll serve ur purpose.
more on tail

heres a method (that can depend on the type of shell you use, bash should work):
tmpvar=`cat a_file | wc -l `; tail -$((tmpvar-4)) a_file
heres another method that should work in more shells:
cat a_file -n | awk '{if($1>4) print $2}'

Related

Replace work in line by combining after grep

I need to replace a word after performing a grep and getting the last line of the result.
Here my example file:
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 NONE
What I need is to select all lines containing 'aaa', get the last one in the result and replace NONE.
I tried
cat <file> | grep "aaa" | tail -n 1 | sed -i 's/NONE/ts8/g'
but it doesn't work.
Any suggestion to do that?
Thanks
With tac + awk solution please try following.
tac Input_file | awk '/aaa/ && ++count==1{sub(/NONE/,"ts8")} 1' | tac
once you are happy with above command try following, to do inplace save into Input_file.
tac Input_file | awk '/aaa/ && ++count==1{sub(/NONE/,"ts8")} 1' | tac > temp && mv temp Input_file
Explanation: Firstly printing Input_file in reverse order by tac then sending its standard output to awk as an input where substituting of NONE to ts8 in very first line(which is actually last line containing aaa). Simply printing all other lines, again sending output to tac to make it in actual order like Input_file's order.
For doing this in a single command, this should work in any version of awk:
awk 'FNR==NR {if ($1=="aaa") n=FNR; next} FNR == n {$3="TS7"} 1' file{,}
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 TS7
To save output in same file use:
awk 'FNR==NR {if ($1=="aaa") n=FNR; next}
FNR == n {$3="TS7"} 1' file{,} > file.out && mv file.out file
Or using gnu sed, you may use:
sed -i -Ez 's/(.*\naaa[[:blank:]]+[^[:blank:]]+[[:blank:]]+)NONE/\1ts8/' file
cat file
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 ts8
If you want to get the last line that matches aaa at the start, you can go through all lines, and in the END block, print the last occurrence and replace NONE with ts8 using awk:
awk '$1=="aaa"{last=$0}END{sub(/NONE/,"ts8",last);print last}' file
In parts:
$1=="aaa" { # If the first field is aaa
last=$0 # Set variable last to the whole line (overwrite on each match)
}
END { # Run once at the end
sub(/NONE/,"ts8",last) # Replace NONE with ts8 in the last variable
print last
}
' file
Output
aaa ts7 ts8

awk + How do I find duplicates in a column?

How do I find duplicates in a column?
$ head countries_lat_long_int_code3.csv | cat -n
1 country,latitude,longitude,name,code
2 AD,42.546245,1.601554,Andorra,376
3 AE,23.424076,53.847818,United Arab Emirates,971
4 AF,33.93911,67.709953,Afghanistan,93
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
7 AL,41.153332,20.168331,Albania,355
8 AM,40.069099,45.038189,Armenia,374
9 AN,12.226079,-69.060087,Netherlands Antilles,599
10 AO,-11.202692,17.873887,Angola,244
For instance this has duplicates in the 5th column.
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
How do I view all the others in this file?
I know I can do this:
awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort
And I can eyeball and see if there is any duplicates, but is there a better way?
Or I can do this:
Find out how may are there completely
$ awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort | wc -l
210
Find out how many unique values are there
$ awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort | uniq | wc -l
183
Therefore there are at most 27 (210-183) duplicates.
EDIT1
My desired output would be something as follows, basically all the columns but just showing the rows that are duplicates:
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
This will give you the duplicated codes
awk -F, 'a[$5]++{print $5}'
if you're only interested in count of duplicate codes
awk -F, 'a[$5]++{count++} END{print count}'
To print duplicated rows try this
awk -F, '$5 in a{print a[$5]; print} {a[$5]=$0}'
This will print the whole row with duplicates found in col $5:
awk -F, 'a[$5]++{print $0}'
This is the less memory aggressive i can guess:
$ cat infile
country,latitude,longitude,name,code
AD,42.546245,1.601554,Andorra,376
AE,23.424076,53.847818,United Arab Emirates,971
AF,33.93911,67.709953,Afghanistan,93
AG,17.060816,-61.796428,Antigua and Barbuda,1
AI,18.220554,-63.068615,Anguilla,1
AL,41.153332,20.168331,Albania,355
AM,40.069099,45.038189,Armenia,374
AN,12.226079,-69.060087,Netherlands Antilles,599
AO,-11.202692,17.873887,Angola,355
$ awk -F\, '$NF in a{if (a[$NF]!=0){print a[$NF];a[$NF]=0}print;next}{a[$NF]=$0}' infile
AG,17.060816,-61.796428,Antigua and Barbuda,1
AI,18.220554,-63.068615,Anguilla,1
AL,41.153332,20.168331,Albania,355
AO,-11.202692,17.873887,Angola,355
NOTE: I have included another duplicate for testing purposes.
If you just want to print out a unique value that repeat over the same file just add at the end of the awk:
awk ... ... | sort | uniq -u
That will print the unique values only on alphabetic order

Awk pass variable containing columns to be printed

I want to pass a variable to awk containing which columns to print from a file?
In this trivial case file.txt contains a single line
11 22 33 44 55
This is what I've tried:
awk -v a='2/4' -v b='$2/$4' '{print a"\n"$a"\n"b"\n"$b}' file.txt
output:
2/4
22
$2/$4
11 22 33 44 55
desired output:
0.5
Is there any way to do this type of "eval" of variable as a command?
Here is one method for dividing columns specified in variables:
$ awk -v num=2 -v denom=4 '{print $num/$denom}' file.txt
0.5
If you trust the person who creates the shell variable b, then here is a method that offers flexibility:
$ b='$2/$4'; awk "{print $b}" file.txt
0.5
$ b='$1*$2'; awk "{print $b}" file.txt
242
$ b='$2,$2/$4,$5'; awk "{print $b}" file.txt
22 0.5 55
The flexibility here is due to the fact that b can contain any awk code. This approach requires that you trust the creator of b.

Bash,Postfix, AWK, Error in filtering deferred mail output

This is what I have tried so far:
cat /var/spool/postfix/deferred/D3B921090 | awk -F"/" '{print $6}' |awk '{$1="" print $0}' | sort | uniq -c | sort -n
and
awk -F"/" '{print $6}' < /var/spool/postfix/deferred/D3B921090 | awk '{$1="" print $0}' | sort | uniq -c | sort -n
I get the following error message when trying to run either command:
awk: line 1: syntax error at or near print
What am I doing wrong?
awk '{$1="" print $0}'
is not a syntactically valid expression, did you mean
awk '{$1=""; print $0}'
which is equal to
awk '{$1=""}1'
?

GNU parallel used with xargs and awk

I have two large tab separated files A.tsv and B.tsv, they look like (the header is not in the file):
A.tsv:
ID AGE
User1 18
...
B.tsv:
ID INCOME
User4 49000
...
I want to select list of IDs in A such that 10=< AGE <=20 and select rows in B that match the list. And I want to use GNU parallel tool. My attempt is two steps:
cat A.tsv | parallel --pipe -q awk '{ if ($3 >= 10 && $3 <= 20) print $1}' > list.tsv
cat list.tsv | parallel --pipe -q xargs -I% awk 'FNR==NR{a[$1];next}($1 in a)' % B.tsv > result.tsv
The first step works but the second one comes with error like:
awk: cannot open User1 (No such file or directory)
How can I fix this? Does this method work even if A.tsv and list.tsv are 2 to 3 times bigger than the memory?
$ for I in $(seq 8 2 22); do echo -e "User$I\t$I" >> A.txt; done; cat A.txt
User8 8
User10 10
User12 12
User14 14
User16 16
User18 18
User20 20
User22 22
$ for I in $(seq 8 2 22); do echo -e "User$I\t100${I}00" >> B.txt; done; cat B.txt
User8 100800
User10 1001000
User12 1001200
User14 1001400
User16 1001600
User18 1001800
User20 1002000
User22 1002200
$ cat A.txt | parallel --pipe -q awk '{if ($2 >= 10 && $2 <= 20) print $1}' > list.txt
$ cat B.txt | parallel --pipe -q grep -f list.txt
User10 1001000
User12 1001200
User14 1001400
User16 1001600
User18 1001800
User20 1002000
I know this: (yes, I saw it)
GNU parallel used with xargs and awk
Asked 8 years, 3 months ago
Modified 8 years, 3 months ago
Viewed 2k times
My solution:
only xargs and awk, only a line without intermediate file, and you don't need install a new tool
awk '{if ($2 >= 10 && $2 <= 20) print $1}' A.tsv | xargs -I myItem awk --assign quebuscar=myItem '$1==quebuscar {print}' B.tsv