conditional awk statement to create a new field with additive value - dataframe

Question
How would I use awk to create a new field that has $2+consistent value?
I am planning to cycle through a list of values but I wouldn't mind using a one liner for each command
PseudoCode
awk '$1 == Bob {$4 = $2 + 400}' file
Sample Data
Philip 13 2
Bob 152 8
Bob 4561 2
Bob 234 36
Bob 98 12
Rey 147 152
Rey 15 1547
Expected Output
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547

just quote Bob, also you want to add third field not second
$ awk '$1=="Bob" {$4=$3+400}1' file | column -t
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547

Here , check if $1 is equal to Bob and , reconstruct the record ($0) by appending $2 FS 400 in to $0. Here FS is the field separator used between 3rd and 4th fields. 1 in the end means tell awk to take the default action which is print.
awk '$1=="Bob"{$0=$0 FS $2 + 400}1' file
Philip 13 2
Bob 152 8 552
Bob 4561 2 4961
Bob 234 36 634
Bob 98 12 498
Rey 147 152
Rey 15 1547
Or , if you want to keep name(Bob) as variable
awk -vname="Bob" '$1==name{$0=$0 FS $2 + 400}1' file

1st solutiuon: Could you please try following too once. I am using here NF and NF+1 awk's out of the box variables. Where $NF denotes value of last column of current line and $(NF+1) will create an additional column if condition of st field stringBob` is found is TRUE.
awk '{$(NF+1)=$1=="Bob"?400+$NF:""} 1' OFS="\t" Input_file
2nd solution: In case we don't want to create a new field and simply want to print the values as per condition then try following, this should be more faster I believe.
awk 'BEGIN{OFS="\t"}{$1=$1;print $0,$1=="Bob"?400+$NF:""}' Input_file
Output will be as follows.
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
{
$(NF+1)=$1=="Bob"?400+$NF:"" ##Creating a new last field here whose value will be depending upon condition check.
##its checking condition if 1st field is having Bob string in it then add 400 value to last field value or make it NULL.
}
1 ##awk works on method of condition then action. so by mentioning 1 making condition TRUE here and NO action defined so by default print of current line will happen.
' OFS="\t" Input_file ##Setting OFS as TAB here where OFS ois output field separator and mentioning Input_file name here.

Related

Skip operation on the line if one of the columns have letters - bash

I'm trying to skip operations on columns rows where End_time has the value "Failed".
Here is my actual file.
check_time.log
Done City Start_time End_time
Yes Chicago 18 10
Yes Atlanta 208 11
No Minnetonka 57 Failed
Yes Hopkins 112 80
No Marietta 2018 Failed
Here is what I have so far.
awk 'BEGIN { OFS = "\t" } NR == 1 { $5 = "Time_diff" } NR >= 2 { $5 = $3 - $4 } 1' < files |column -t
Output:
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed 57
Yes Hopkins 112 80 32
No Marietta 2018 Failed 2018
Desired output should look like this:
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
So how do I skip that?
You should be just able to change:
$5 = $4 - $5
into:
if ($4 != "Failed") { $5 = $3 - $4 }
This will:
refuse to change $5 from empty to the calculated value in lines where the end time is Failed; and
correctly do the calculation for all other lines.
I say correctly since it appears you want the start time minus the end time in those cases, despite the fact durations tend to be end time minus the start time. I've changed it to match your desired output rather than the "sane" expectation.
A transcript follows so you can see it in action:
pax$ awk 'BEGIN{OFS="\t"}NR==1{$5="Time_diff"}NR>=2{if($4!="Failed"){$5=$3-$4}}1' <inputFile.txt |column -t
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
And, just as an aside, you may want to consider what will happen when you start getting information from New York, San Antonio, Salt Lake City or, even worse, Maccagno con Pino e Veddasca :-)
Could you please try following.(Here considering that your Input_file's last fields will be this order only and will not have any other additional fields, if they have then you may need to adjust field numbers because in case your city's value is having space in it then field number from starting will create an issue in simply differentiating values for all lines because field values will be different then as per line)
awk '
FNR==1{
print $0,"Time_Diff"
next
}
$NF!="Failed"{
$(NF+1)=$(NF-1)-$NF
}
1
' Input_file | column -t
Output will be as follows.
Done City Start_time End_time Time_Diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
Explanation: Adding complete explanation for above code now.
awk ' ##Starting awk program from here.
FNR==1{ ##Checking conditoin if line is very first line then do following.
print $0,"Time_Diff" ##Printing current line with string Time_Diff here on very first line to print headings.
next ##next is awk keyword which will skip all further statements from here.
}
$NF!="Failed"{ ##Checking if last field $NF where NF is number of fields and $ means in awk field value is NOT failed then do following.
$(NF+1)=$(NF-1)-$NF ##Add a new column by doing $(NF+1) whose value will be difference of 2nd last column and last column as per samples.
} ##Closing this condition block here.
1 ##Mentioning 1 will print edited/non-edited line for Input_file.
' Input_file | ##Mentioning Input_file name and passing awk program output to next command by using pipe(|).
column -t ##Using column -t will print the output in TAB separated format.
If you are considering Perl,
> cat kwa.in
Done City Start_time End_time
Yes Chicago 18 10
Yes Atlanta 208 11
No Minnetonka 57 Failed
Yes Hopkins 112 80
No Marietta 2018 Failed
> perl -lane ' print join(" ",#F,"Time_Diff") if $.==1; if($.>1 ) { $F[4]=$F[2]-$F[3] if not $F[3]=~/Failed/; print join(" ",#F) } ' kwa.in | column -t
Done City Start_time End_time Time_Diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
>

exchange columns based on some conditions

I have a text file with 5 columns. If the number of the 5th column is less than the 3rd column, replace the 4th and 5th column as 2nd and 3rd column. If the number of the 5th column is greater than 3rd column, leave that line as same.
1EAD A 396 B 311
1F3B A 118 B 171
1F5V A 179 B 171
1G73 A 162 C 121
1BS0 E 138 G 230
Desired output
1EAD B 311 A 396
1F3B A 118 B 171
1F5V B 171 A 179
1G73 C 121 A 162
1BS0 E 138 G 230
$ awk '{ if ($5 >= $3) print $0; else print $1"\t"$4"\t"$5"\t"$2"\t"$3; }' foo.txt

How to append the count of numbers in each line of text using awk?

I have several very large text files and would like to append the count of numbers following by a space in front of each line. Could anyone kindly suggest how to do it efficiently using Awk?
Input:
10 109 120 200 1148 1210 1500 5201
9 139 1239 1439 6551
199 5693 5695
Desired Output:
8 10 109 120 200 1148 1210 1500 5201
5 9 139 1239 1439 6551
3 199 5693 5695
You can use
awk '{print NF,$0}' input.txt
It says print number of field of the current line NF separated by current field separator , which in this case is a space then print the current line itself $0.
this will work for you:
awk '{$0=NF FS $0}7' file

matching columns in two files with different line numbers

This is a rather repeated question but I could not figure it out with my files, so, any help will be highly appreciated.
I have two files, I want to compare their first fields and print the common lines into a third file, an example of my files:
file 1:
gene1
gene2
gene3
file 2:
gene1|trans1|12|233|345 45
gene1|trans2|12|342|232 45
gene2|trans2|12|344|343 12
gene2|trans2|12|344|343 45
gene2|trans2|12|344|343 12
gene2|trans3|12|34r|343 325
gene2|trans2|12|344|343 545
gene3|trans4|12|344|333 454
gene3|trans2|12|343|343 545
gene3|trans3|12|344|343 45
gene4|trans2|12|344|343 2112
gene4|trans2|12|344|343 455
file 2 contains more fields. Please pay attention that the first field is not exactly like the first file but the gene element only matches.
The output should look like this:
gene1|trans1|12|233|345 45
gene1|trans2|12|342|232 45
gene2|trans2|12|344|343 12
gene2|trans2|12|344|343 45
gene2|trans2|12|344|343 12
gene2|trans3|12|34r|343 325
gene2|trans2|12|344|343 545
gene3|trans4|12|344|333 454
gene3|trans2|12|343|343 545
gene3|trans3|12|344|343 45
I use this code, it does not give me any error but it does not give me any output either:
awk '{if (f[$1] != FILENAME) a[$1]++; f[$1] = FILENAME; } END{ for (i in a) if (a[i] > 1) print i; }' file1 file1
thank you very much
Some like this?
awk -F\| 'FNR==NR {a[$0]++;next} $1 in a' file1 file2
gene1|trans1|12|233|345 45
gene1|trans2|12|342|232 45
gene2|trans2|12|344|343 12
gene2|trans2|12|344|343 45
gene2|trans2|12|344|343 12
gene2|trans3|12|34r|343 325
gene2|trans2|12|344|343 545
gene3|trans4|12|344|333 454
gene3|trans2|12|343|343 545
gene3|trans3|12|344|343 45
In this example, grep is sufficient:
grep -w -f file1 file2

Linux red-hat5.4 + awk file manipulation

how to match PARAM (param=name) word in file.txt and print the lines between
NAMEx and NAMEy, via awk , as the following way :
if PARAM matched in the file.txt , then awk will print only the words between the close NAMES strings while PARAM is one of the names
remark1 PARAM can be any name as Pitter , Bob , etc.....
remark2 awk will get PARAM=(any name)
remark3 we not know how many spaces we have between (# to NAME)
more file.txt
# NAMES1
Pitter 23
Bob 75
# NAMES2
Donald 54
Josef 85
Patrick 21
# NAMES3
Tom 32
Jennifer 85
Svetlana 25
# NAMES4
examples ( regarding file.txt contents )
In case PARAM=pitter then awk will print the names to out.txt file
Pitter 23
Bob 75
In case PARAM=Josef then awk will print the names to out.txt file
Donald 54
Josef 85
Patrick 21
In case PARAM=Jennifer then awk will print the names to out.txt file
Tom 32
Jennifer 85
Svetlana 25
using RS of awk would be helpful in this case. see the test below:
testing with example
kent$ cat file
# NAMES1
Pitter 23
Bob 75
# NAMES2
Donald 54
Josef 85
Patrick 21
# NAMES3
Tom 32
Jennifer 85
Svetlana 25
# NAMES4
kent$ awk -vPARAM="Pitter" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Pitter 23
Bob 75
kent$ awk -vPARAM="Josef" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Donald 54
Josef 85
Patrick 21
kent$ awk -vPARAM="Jennifer" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Tom 32
Jennifer 85
Svetlana 25
note, there are some empty lines in output, because they existed in your input. however it would be easy to remove them from output.
update
if you have spaces between # and NAMES, you can try:
awk -vPARAM="Pitter" 'BEGIN{RS="# *NAMES."} {if($0~PARAM)print}' file