Replace value in particular columns in csv file - awk

I would like to replace the values which are > than 20 in columns 5 and 7 to AAA
input file
9179,22.4,-0.1,22.4,2.6,0.1,2.6,39179
9179,98.1,-1.7,98.11,1.9,1.7,2.55,39179
9179,-48.8,0.5,48.8,-1.2,-0.5,1.3,39179
6121,25,0,25,50,0,50,36121
6123,50,0,50,50,0,50,36123
6125,75,0,75,50,0,50,36125
output desired
9179,22.4,-0.1,22.4,2.6,0.1,2.6,39179
9179,98.1,-1.7,98.11,1.9,1.7,2.55,39179
9179,-48.8,0.5,48.8,-1.2,-0.5,1.3,39179
6121,25,0,25,AAA,0,AAA,36121
6123,50,0,50,AAA,0,AAA,36123
6125,75,0,75,AAA,0,AAA,36125
I tried
With this command I replace the values in column 5, how to do it for column 7 too.
awk -F ',' -v OFS=',' '$1 { if ($5>20) $5="AAA"; print}' file
Thanks in advance

here is another take for making the columns set configurable
$ awk -v cols="5,7" 'BEGIN {FS=OFS=","; split(cols,a)}
{for(i in a) if($a[i]>20) $a[i]="AAA"}1' file
9179,22.4,-0.1,22.4,2.6,0.1,2.6,39179
9179,98.1,-1.7,98.11,1.9,1.7,2.55,39179
9179,-48.8,0.5,48.8,-1.2,-0.5,1.3,39179
6121,25,0,25,AAA,0,AAA,36121
6123,50,0,50,AAA,0,AAA,36123
6125,75,0,75,AAA,0,AAA,36125

awk 'BEGIN{FS=OFS=","} $5>20{$5="AAA"} $7>20{$7="AAA"}1' file
9179,22.4,-0.1,22.4,2.6,0.1,2.6,39179
9179,98.1,-1.7,98.11,1.9,1.7,2.55,39179
9179,-48.8,0.5,48.8,-1.2,-0.5,1.3,39179
6121,25,0,25,AAA,0,AAA,36121
6123,50,0,50,AAA,0,AAA,36123
6125,75,0,75,AAA,0,AAA,36125
You can use two {..} for multiple checks and action

Related

summarizing the contents of a text file to an other one using awk

I have a big text file with 2 tab separated fields. as you see in the small example every 2 lines have a number in common. I want to summarize my text file in this way.
1- look for the lines that have the number in common and sum up the second column of those lines.
small example:
ENST00000054666.6 2
ENST00000054666.6_2 15
ENST00000054668.5 4
ENST00000054668.5_2 10
ENST00000054950.3 0
ENST00000054950.3_2 4
expected output:
ENST00000054666.6 17
ENST00000054668.5 14
ENST00000054950.3 4
as you see the difference is in both columns. in the 1st column there is only one repeat of each common and without "_2" and in the 2nd column the values is sum up of both lines (which have common number in input file).
I tried this code but does not return what I want:
awk -F '\t' '{ col2 = $2, $2=col2; print }' OFS='\t' input.txt > output.txt
do you know how to fix it?
Solution 1st: Following awk may help you on same.
awk '{sub(/_.*/,"",$1)} {a[$1]+=$NF} END{for(i in a){print i,a[i]}}' Input_file
Solution 2nd: In case your Input_file is sorted by 1st field then following may help you.
awk '{sub(/_.*/,"",$1)} prev!=$1 && prev{print prev,val;val=""} {val+=$NF;prev=$1} END{if(val){print prev,val}}' Input_file
Use > output.txt at the end of the above codes in case you need the output in a output file too.
If order is not a concern, below may also help :
awk -v FS="\t|_" '{count[$1]+=$NF}
END{for(i in count){printf "%s\t%s%s",i,count[i],ORS;}}' file
ENST00000054668.5 14
ENST00000054950.3 4
ENST00000054666.6 17
Edit :
If the order of the output does matter, below approach using a flag helps :
$ awk -v FS="\t|_" '{count[$1]+=$NF;++i;
if(i==2){printf "%s\t%s%s",$1,count[$1],ORS;i=0}}' file
ENST00000054666.6 17
ENST00000054668.5 14
ENST00000054950.3 4

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

how to insert new row in 1st position with single quotes with awk

I got very limited knowledge with awk.
I got big csv files (500.000 lines) with following lines format:
'0000011197118123','136',,'35993706', '33745', '22052', 'appsflyer.com'
'0000011194967123','136',,'35282806', '74518', '30317', 'crashlytics.com'
'0000011199022123’,’139',,'01363100', '8776250', '373671', 'whatsapp.com'
............
I need to cut first 8 digit from first column and add date field, as a new first column, (date should be the day-1 date) like following:
'2016/03/12','97118123','136',,'35993706','33745','22052','appsflyer.com'
'2016/03/12','94967123','136',,'35282806','74518','30317','crashlytics.com'
'2016/03/12','99022123’,’139',,'01363100','8776250','373671','whatsapp.com'
Thanks a lot for your time.
M.Tave
You can do something similar to:
awk -F, -v date="2016/03/12" 'BEGIN{OFS=FS}
{sub(/^.{8}/, "'\''", $1)
s="'\''"date"'\''"
$1=s OFS $1
print }' csv_file
I did not understand how you a determining your date, so i just used a string.
Based on comments, you can do:
awk -v d="2016/03/12" 'sub(/^.{8}/,"'\''"d"'\'','\''")' csv_file
$ awk -v d='2016/03/12' '{print "\047" d "\047,\047" substr($0,10)}' file
'2016/03/12','97118123','136',,'35993706', '33745', '22052', 'appsflyer.com'
'2016/03/12','94967123','136',,'35282806', '74518', '30317', 'crashlytics.com'
'2016/03/12','99022123’,’139',,'01363100', '8776250', '373671', 'whatsapp.com'

How do I pass a variable into AWK FNR?

#!/bin/bash
export num=50
echo $num
awk -v awk_num=$num 'FNR==2, FNR==$awknum {print $1;}' big_report > short_report
I have a big_report file. The desired output is to print rows 2 to 50 in column 1 of big_report into short_report. However, when I run above the result in short_report, includes all lines in column 1 instead of the specified rows 2-50.
I would really appreciate it if anyone could help! Thanks!!!
Like this:
awk -v awk_num=$num 'FNR==2, FNR==awk_num {print $1}' big_report > short_report

use awk to calculate percentage of column 1 derived from column 2 and add it to column 3

I have a file consisting out of 2 columns, both contain only whole numbers. I want awk to add a third column which shows the percentage of column 1, derived from column 2.
So, for example, column 1 shows:
cat file
15 150
I want awk to add column 3 to show 10 (because 15 is 10% of 150, right?) like this:
15 150 10
The columns are separated by tabs.
Thank you for your help!
Another awk
awk '$3=100*$1/$2' file
To overwrite file
awk '$3=100*$1/$2' file > tmp && mv tmp file
If for some reason you have 0s in your file
awk '$2>0{$3=100*$1/$2}1' file > tmp && mv tmp file
or
awk '$2>0&&$3=100*$1/$2' file > tmp && mv tmp file
Untested, but an educated guess at what might work:
awk '{ print $1, $2, 100*$1/$2 }' yourfile.txt
To save it to somewhere, you'll have to redirect ´stdout` to a file. If you want this to overwrite your original file (don't do this until you've tested that it works!) you could wrap it in a bash script:
#!/bin/bash
awk '{ print $1, $2, 100*$1/$2 }' "$1" > "$1.tmp"
mv "$1.tmp" "$1"
and run it like
./thebashscript.sh yourfile.txt