How to add two field separators to same table by awk? - awk

I have a table like that
100 rows like that and these are first 2 rows:
25 Mike Robin 115 DDA
37 Tom Murray 98 SSM
I want to rewrite whole table like this:
25_Mike_Robin Mike Robin 115 DDA
37_Tom_Murray Tom Murray 98 SSM
I tried:
awk '{OFS="_"}; {print $1, $2, $3} {print $2, $3, $4, $5}' test.txt
Output is like that, they are in separate rows and all have same FS:
25_Mike_Robin
Mike_Robin_115_DDA
37_Tom_Murray
Tom Murray 98 SSM
Then I tried this, but it gave me a syntax error:
awk '{OFS="_"}; {print $1, $2, $3}; {OFS="\t"; {print $2, $3, $4,$5}' test.vcf

A one-liner:
awk '{$1=$1"_"$2"_"$3}1' file

Sample input:
$ cat input
25 Mike Robin 115 DDA
39 Sarah Cook 223 DDC
127 Elizabeth Johnstone 68 XP3
One awk idea where we redefine the 1st field:
$ awk '{$1=$1 "_" $2 "_" $3; print}' input
25_Mike_Robin Mike Robin 115 DDA
39_Sarah_Cook Sarah Cook 223 DDC
127_Elizabeth_Johnstone Elizabeth Johnstone 68 XP3

Using awk:
awk '{printf "%s_%s_%s %s %s %s %s\n", $1, $2, $3, $2, $3, $4, $5}' <<< "25 Mike Robin 115 DDA"
25_Mike_Robin Mike Robin 115 DDA
or
awk '{print $1"_"$2"_"$3" "$2" "$3" "$4" "$5}' <<< "25 Mike Robin 115 DDA"
25_Mike_Robin Mike Robin 115 DDA

Related

conditional awk statement to create a new field with additive value

Question
How would I use awk to create a new field that has $2+consistent value?
I am planning to cycle through a list of values but I wouldn't mind using a one liner for each command
PseudoCode
awk '$1 == Bob {$4 = $2 + 400}' file
Sample Data
Philip 13 2
Bob 152 8
Bob 4561 2
Bob 234 36
Bob 98 12
Rey 147 152
Rey 15 1547
Expected Output
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
just quote Bob, also you want to add third field not second
$ awk '$1=="Bob" {$4=$3+400}1' file | column -t
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
Here , check if $1 is equal to Bob and , reconstruct the record ($0) by appending $2 FS 400 in to $0. Here FS is the field separator used between 3rd and 4th fields. 1 in the end means tell awk to take the default action which is print.
awk '$1=="Bob"{$0=$0 FS $2 + 400}1' file
Philip 13 2
Bob 152 8 552
Bob 4561 2 4961
Bob 234 36 634
Bob 98 12 498
Rey 147 152
Rey 15 1547
Or , if you want to keep name(Bob) as variable
awk -vname="Bob" '$1==name{$0=$0 FS $2 + 400}1' file
1st solutiuon: Could you please try following too once. I am using here NF and NF+1 awk's out of the box variables. Where $NF denotes value of last column of current line and $(NF+1) will create an additional column if condition of st field stringBob` is found is TRUE.
awk '{$(NF+1)=$1=="Bob"?400+$NF:""} 1' OFS="\t" Input_file
2nd solution: In case we don't want to create a new field and simply want to print the values as per condition then try following, this should be more faster I believe.
awk 'BEGIN{OFS="\t"}{$1=$1;print $0,$1=="Bob"?400+$NF:""}' Input_file
Output will be as follows.
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
{
$(NF+1)=$1=="Bob"?400+$NF:"" ##Creating a new last field here whose value will be depending upon condition check.
##its checking condition if 1st field is having Bob string in it then add 400 value to last field value or make it NULL.
}
1 ##awk works on method of condition then action. so by mentioning 1 making condition TRUE here and NO action defined so by default print of current line will happen.
' OFS="\t" Input_file ##Setting OFS as TAB here where OFS ois output field separator and mentioning Input_file name here.

Skip operation on the line if one of the columns have letters - bash

I'm trying to skip operations on columns rows where End_time has the value "Failed".
Here is my actual file.
check_time.log
Done City Start_time End_time
Yes Chicago 18 10
Yes Atlanta 208 11
No Minnetonka 57 Failed
Yes Hopkins 112 80
No Marietta 2018 Failed
Here is what I have so far.
awk 'BEGIN { OFS = "\t" } NR == 1 { $5 = "Time_diff" } NR >= 2 { $5 = $3 - $4 } 1' < files |column -t
Output:
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed 57
Yes Hopkins 112 80 32
No Marietta 2018 Failed 2018
Desired output should look like this:
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
So how do I skip that?
You should be just able to change:
$5 = $4 - $5
into:
if ($4 != "Failed") { $5 = $3 - $4 }
This will:
refuse to change $5 from empty to the calculated value in lines where the end time is Failed; and
correctly do the calculation for all other lines.
I say correctly since it appears you want the start time minus the end time in those cases, despite the fact durations tend to be end time minus the start time. I've changed it to match your desired output rather than the "sane" expectation.
A transcript follows so you can see it in action:
pax$ awk 'BEGIN{OFS="\t"}NR==1{$5="Time_diff"}NR>=2{if($4!="Failed"){$5=$3-$4}}1' <inputFile.txt |column -t
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
And, just as an aside, you may want to consider what will happen when you start getting information from New York, San Antonio, Salt Lake City or, even worse, Maccagno con Pino e Veddasca :-)
Could you please try following.(Here considering that your Input_file's last fields will be this order only and will not have any other additional fields, if they have then you may need to adjust field numbers because in case your city's value is having space in it then field number from starting will create an issue in simply differentiating values for all lines because field values will be different then as per line)
awk '
FNR==1{
print $0,"Time_Diff"
next
}
$NF!="Failed"{
$(NF+1)=$(NF-1)-$NF
}
1
' Input_file | column -t
Output will be as follows.
Done City Start_time End_time Time_Diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
Explanation: Adding complete explanation for above code now.
awk ' ##Starting awk program from here.
FNR==1{ ##Checking conditoin if line is very first line then do following.
print $0,"Time_Diff" ##Printing current line with string Time_Diff here on very first line to print headings.
next ##next is awk keyword which will skip all further statements from here.
}
$NF!="Failed"{ ##Checking if last field $NF where NF is number of fields and $ means in awk field value is NOT failed then do following.
$(NF+1)=$(NF-1)-$NF ##Add a new column by doing $(NF+1) whose value will be difference of 2nd last column and last column as per samples.
} ##Closing this condition block here.
1 ##Mentioning 1 will print edited/non-edited line for Input_file.
' Input_file | ##Mentioning Input_file name and passing awk program output to next command by using pipe(|).
column -t ##Using column -t will print the output in TAB separated format.
If you are considering Perl,
> cat kwa.in
Done City Start_time End_time
Yes Chicago 18 10
Yes Atlanta 208 11
No Minnetonka 57 Failed
Yes Hopkins 112 80
No Marietta 2018 Failed
> perl -lane ' print join(" ",#F,"Time_Diff") if $.==1; if($.>1 ) { $F[4]=$F[2]-$F[3] if not $F[3]=~/Failed/; print join(" ",#F) } ' kwa.in | column -t
Done City Start_time End_time Time_Diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
>

matching columns in two files with different line numbers

This is a rather repeated question but I could not figure it out with my files, so, any help will be highly appreciated.
I have two files, I want to compare their first fields and print the common lines into a third file, an example of my files:
file 1:
gene1
gene2
gene3
file 2:
gene1|trans1|12|233|345 45
gene1|trans2|12|342|232 45
gene2|trans2|12|344|343 12
gene2|trans2|12|344|343 45
gene2|trans2|12|344|343 12
gene2|trans3|12|34r|343 325
gene2|trans2|12|344|343 545
gene3|trans4|12|344|333 454
gene3|trans2|12|343|343 545
gene3|trans3|12|344|343 45
gene4|trans2|12|344|343 2112
gene4|trans2|12|344|343 455
file 2 contains more fields. Please pay attention that the first field is not exactly like the first file but the gene element only matches.
The output should look like this:
gene1|trans1|12|233|345 45
gene1|trans2|12|342|232 45
gene2|trans2|12|344|343 12
gene2|trans2|12|344|343 45
gene2|trans2|12|344|343 12
gene2|trans3|12|34r|343 325
gene2|trans2|12|344|343 545
gene3|trans4|12|344|333 454
gene3|trans2|12|343|343 545
gene3|trans3|12|344|343 45
I use this code, it does not give me any error but it does not give me any output either:
awk '{if (f[$1] != FILENAME) a[$1]++; f[$1] = FILENAME; } END{ for (i in a) if (a[i] > 1) print i; }' file1 file1
thank you very much
Some like this?
awk -F\| 'FNR==NR {a[$0]++;next} $1 in a' file1 file2
gene1|trans1|12|233|345 45
gene1|trans2|12|342|232 45
gene2|trans2|12|344|343 12
gene2|trans2|12|344|343 45
gene2|trans2|12|344|343 12
gene2|trans3|12|34r|343 325
gene2|trans2|12|344|343 545
gene3|trans4|12|344|333 454
gene3|trans2|12|343|343 545
gene3|trans3|12|344|343 45
In this example, grep is sufficient:
grep -w -f file1 file2

awk print multiple column file into single column

My file looks like that:
315
717
461 737
304
440
148 206 264 322 380 438 496
801
495
355
249 989
768
946
I want to print all those columns in a single column file (one long first column).
If I try to
awk '{print $1}'> new_file;
awk '{print $2}' >> new_file
There are white spaces in between. How to solve this thing?
Perhaps a bit cryptic:
awk '1' RS='[[:space:]]+' inputfile
It says "print every record, treating any whitespace as record separators".
You can simply use something like:
awk '{ for (i=1; i<=NF; i++) print $i }' file
For each line, iterate through columns, and print each column in a new line.
You don't need as much as sed for this: just translate spaces to newlines
tr ' ' '\n' < file
tr is purely a filter, so you have to redirect a file into it.
A perl solution:
perl -pe 's/\s+(?=\S)/\n/g' infile
Output:
315
717
461
737
304
440
148
206
264
322
380
438
496
801
495
355
249
989
768
946

Linux red-hat5.4 + awk file manipulation

how to match PARAM (param=name) word in file.txt and print the lines between
NAMEx and NAMEy, via awk , as the following way :
if PARAM matched in the file.txt , then awk will print only the words between the close NAMES strings while PARAM is one of the names
remark1 PARAM can be any name as Pitter , Bob , etc.....
remark2 awk will get PARAM=(any name)
remark3 we not know how many spaces we have between (# to NAME)
more file.txt
# NAMES1
Pitter 23
Bob 75
# NAMES2
Donald 54
Josef 85
Patrick 21
# NAMES3
Tom 32
Jennifer 85
Svetlana 25
# NAMES4
examples ( regarding file.txt contents )
In case PARAM=pitter then awk will print the names to out.txt file
Pitter 23
Bob 75
In case PARAM=Josef then awk will print the names to out.txt file
Donald 54
Josef 85
Patrick 21
In case PARAM=Jennifer then awk will print the names to out.txt file
Tom 32
Jennifer 85
Svetlana 25
using RS of awk would be helpful in this case. see the test below:
testing with example
kent$ cat file
# NAMES1
Pitter 23
Bob 75
# NAMES2
Donald 54
Josef 85
Patrick 21
# NAMES3
Tom 32
Jennifer 85
Svetlana 25
# NAMES4
kent$ awk -vPARAM="Pitter" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Pitter 23
Bob 75
kent$ awk -vPARAM="Josef" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Donald 54
Josef 85
Patrick 21
kent$ awk -vPARAM="Jennifer" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Tom 32
Jennifer 85
Svetlana 25
note, there are some empty lines in output, because they existed in your input. however it would be easy to remove them from output.
update
if you have spaces between # and NAMES, you can try:
awk -vPARAM="Pitter" 'BEGIN{RS="# *NAMES."} {if($0~PARAM)print}' file