awk to print the fields based on condition: - awk

I would like to compare first field and third filed of Input.csv file with Second and third fields of Master.csv.
And if the fruits name matches and the Amount of Master.csv is less than Amount of Input.csv then print all the lines from both the files.
For example,
Fruits==Apple,Amount <20 from Master.csv need to be checked with Input.csv, so the output would be
Fruits,Region,Amount,Details
Apple,North,10,Abc
Apple,south,9,Abc
Input.csv
Fruits,Region,Amount,Details
Apple,North,10,Abc
Orange,East,115,Def
Apple,south,9,Abc
Apple,West,25,Abc
Orange,West,150,Def
Orange,North,200,Def
Mango,North,50,Ghi
Mango,East,75,Ghi
Master.csv
Details,Fruits,Amount
xxx,Apple,20
yyy,Mango,60
zzz,Cherry,80
Desired Output.csv
Fruits,Region,Amount,Details,Details,Fruits,Amount
Apple,North,10,Abc,xxx,Apple,20
Apple,south,9,Abc,xxx,Apple,20
Mango,North,50,Ghi,yyy,Mango,60
I have tried like below command
awk -F "," 'FNR==NR {a[$1]; b[$3]; next} $2 in a && $3 < b' Input.csv Master.csv > Output.csv

Following awk may help you on same:
awk -F"," 'FNR==1 && FNR==NR{val=$0} FNR==NR{a[$2]=$3;b[$2]=$0;next} FNR==1 && FNR!=NR{for(i=NF;i>0;i--){val1=val1?$i OFS val1:$i};print val,val1} (($1 in a) && $3<a[$1]){print $0,b[$1]}' OFS=, master.csv input.csv
Adding a non-one liner form of solution now:
awk -F"," '
FNR==1 && FNR==NR{
val=$0
}
FNR==NR{
a[$2]=$3;
b[$2]=$0;
next
}
FNR==1 && FNR!=NR{
for(i=NF;i>0;i--){
val1=val1?$i OFS val1:$i};
print val,val1
}
(($1 in a) && $3<a[$1]){
print $0,b[$1]
}
' OFS=, master.csv input.csv
Output will be as follows:
Details,Fruits,Amount,Fruits,Region,Amount,Details
Apple,North,10,Abc,xxx,Apple,20
Apple,south,9,Abc,xxx,Apple,20
Mango,North,50,Ghi,yyy,Mango,60

Related

Grouping duplicated fields with awk

I have the following file:
ID|2018-04-29
ID|2018-04-29
ID|2018-04-29
ID1|2018-06-26
ID1|2018-06-26
ID1|2018-08-07
ID1|2018-08-22
and using awk, I want to add $3 that groups the duplicated IDs based on $1 and $2 so that the output would be
ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4
I tried the following code but it does not give me the desired output. Also, I am not sure if I can apply it to a column with date in it.
awk -F"|" '{print $0,"group"++seen[$1,$3]}' OFS="|"
Any hints on how to achieve it using awk (one-liner, if possible) would be highly appreciated.
With your shown samples, please try following awk code.
awk -v OFS="|" '!arr[$0]++{count++} {print $0,"group"count}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
OFS="|" ##Setting OFS to | here.
}
!arr[$0]++{ ##Checking if current line is NOT present in array then do following.
count++ ##Increasing count with 1 here.
}
{
print $0,"group"count ##Printing current line with group and count value here.
}
' Input_file ##Mentioning Input_file name here.
and using awk, I want to add $3 that groups the duplicated IDs based
on $1 and $2 so that the output would be
Using $1 and $2
If input file is sorted then:
$ awk 'BEGIN{FS=OFS="|"}{print $0, "group" (!a[$1,$2]++?++c:c)}' file
ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4
If file not sorted then :
$ awk 'BEGIN{FS=OFS="|"}{k=$1 SUBSEP $2}!(k in a){a[k]=++c}{print $0, "group" a[k]}' file
ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4
Better Readable version:
awk 'BEGIN{
FS=OFS="|"
}
{
k=$1 SUBSEP $2
}
!(k in a){
a[k]=++c
}
{
print $0, "group" a[k]
}' file
BEGIN {OFS = FS = "|"}
{ if ($0 != prev) { #new item
prev = $0
print $1, $2, "group" ++g
}
else {
print $1, $2, "group" g
}
}
Note that the list has to be sorted (from your example, I assume it is).
This is my first time posting answer here. Hope the code is readable for you and hope it helps.

awk search pattern in a specific field and replace its content

I need to found field of password that is empty, with space or tab, and replace it with x (on /etc/passwd file)
I found this syntax with awk, that show users where second field (using : as delimiter) is or empty, or has space or tab inside:
awk -F":" '($2 == "" || $2 == " " || $2 == "\t") {print $0}' $file
and result is the follow:
user1::53556:100::/home/user1:/bin/bash
user2: :53557:100::/home/user2:/bin/bash
user3: :53558:100::/home/user3:/bin/bash
How I can say to awk to replace this 2nd field (empty or with space or tab) with another character? (for example x)
Could you please try following.
awk 'BEGIN{FS=OFS=":"} {$2=$2=="" || $2~/^[[:space:]]+$/?"X":$2} 1' Input_file
Explanation: Adding explanation of above code.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section here which will be executed before Input_file is being read.
FS=OFS=":" ##Setting FS and OFS as colon here for all lines of Input_file.
} ##Closing BEGIN section block here.
{
$2=$2=="" || $2~/^[[:space:]]+$/?"X":$2 ##Checking condition if $2(2nd field) of current line is either NULL or having complete space in it then put its vaklue as X or keep $2 value as same as it is.
}
1 ##mentioning 1 will print edited/non-edited current line.
' Input_file ##Mentioning Input_file name here.
EDIT: As per OP, OP need NOT to touch last line of Input_file so adding following solutio now.
tac Input_file | awk 'BEGIN{FS=OFS=":"} FNR==1{print;next} {$2=$2=="" || $2~/^[[:space:]]+$/?"X":$2} 1' | tac
EDIT2: In case you want to do it kin single awk itself then try following.
awk '
BEGIN{
FS=OFS=":"
}
prev{
num=split(prev,array,":")
array[2]=array[2]=="" || array[2]~/^[[:space:]]+$/?"X":array[2]
for(i=1;i<=num;i++){
val=(val?val OFS array[i]:array[i])
}
print val
val=""
}
{
prev=$0
}
END{
if(prev){
print prev
}
}' Input_file
In case you want to change Input_file itself append > temp_file && mv temp_file Input_file in above code.
$ awk 'BEGIN{FS=OFS=":"} (NF>1) && ($2~/^[[:space:]]*$/){$2="x"} 1' file
user1:x:53556:100::/home/user1:/bin/bash
user2:x:53557:100::/home/user2:/bin/bash
user3:x:53558:100::/home/user3:/bin/bash
To change the original file using GNU awk:
awk -i inplace 'BEGIN{FS=OFS=":"} (NF>1) && ($2~/^[[:space:]]*$/){$2="x"} 1' file
or with any awk:
awk 'BEGIN{FS=OFS=":"} (NF>1) && ($2~/^[[:space:]]*$/){$2="x"} 1' file > tmp && mv tmp file
The test for NF>1 ensures we only operate on lines that already have at least 2 fields and so we don't create a line like :x in the output when there's an empty line in the input file. The rest is hopefully obvious.

How to not remove the header while executing awk

I have a file file like this :
k_1_1
k_1_3
k_1_6
...
I have a file file2 :
0,1,2,3,...
k_1_1,17,16,15,...
k_1_2,17,89,15,...
k_1_3,10,26,45,...
k_1_4,17,16,15,...
k_1_5,10,26,45,...
k_1_6,17,16,15,...
...
I want to print lines of file2 that is matched with fileThe desired output is :
0,1,2,3,...
k_1_1,17,16,15,...
k_1_3,10,26,45,...
k_1_6,17,16,15,...
I tried
awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1];next}$1 in a {print $0}' file file2 > result
But the header line is gone in result like this :
k_1_1,17,16,15,...
k_1_3,10,26,45,...
k_1_6,17,16,15,...
How can a maintain it? Thank you.
Always print the first line, unconditionally.
awk 'BEGIN{FS=OFS=","}
NR==FNR{a[$1];next}
FNR==1 || $1 in a' file file2 > result
Notice also how { print $0 } is not necessary because it's the default action.
A very ad-hoc solution to your problem could be to compose the output in a command group:
{ head -1 file2; awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1];next}$1 in a {print $0}' file file2; } > result
Could you please try following.
awk -F, 'FNR==NR{a[$1]=$0;next} FNR==1 && ++count==1{print;next} a[$1]' Input_file Input_file2
OR
awk -F, 'FNR==NR{a[$1]=$0;next} FNR==1{print;next} a[$1]' Input_file Input_file2

awk to count of occurrences then split into two file

Would like to count number of occurences based on $2 field then split the input file into two output files ,
if the $2 field occurances more than 3 times then those lines re-dirceted into OpFile11.txt else re-directed into OpFile22.txt
Input.csv
Des1,Location,Decs2
aaa,a123,xxx
bbb,b789,yyy
xxx,a123,aaa
aaa,a123,xxx
bbb,b789,yyy
ccc,c567,zzz
xxx,a123,aaa
ddd,d456,ddd
OpFile11.txt
aaa,a123,xxx
xxx,a123,aaa
aaa,a123,xxx
xxx,a123,aaa
OpFile22.txt
bbb,b789,yyy
bbb,b789,yyy
ccc,c567,zzz
ddd,d456,ddd
Step#1 : Counting number of occurence:
awk -F, '{key=$2;++a[key]} END {for(i in a) print i","a[i]}' Input.csv
d456,1
b789,2
c567,1
a123,4
Step#2 : Spliting the input file into two parts:
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1]=$0;next} ($2 in a) { print $0 }' OccurGR3.csv Input.csv > OpFile11.txt
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1]=$0;next} !($2 in a) { print $0 }' OccurGR3.csv Input.csv > OpFile22.txt
where OccurGR3.csv
a123,4
Please suggest to avoid three steps , looking for your suggestions !!!
awk -F, '
NR==FNR { cnt[$2]++; next }
{ print > ( "OpFile" (cnt[$2]<3?22:11) ".txt" ) }
' Input.csv Input.csv

print unique lines based on field

Would like to print unique lines based on first field , keep the first occurrence of that line and remove duplicate other occurrences.
Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc
Desired Output:
10,15-10-2014,abc
20,12-10-2014,bcd
40,06-10-2014,ghi
Have tried below command and in-complete
awk 'BEGIN { FS = OFS = "," } { !seen[$1]++ } END { for ( i in seen) print $0}' Input.csv
Looking for your suggestions ...
You put your test for "seen" in the action part of the script instead of the condition part. Change it to:
awk -F, '!seen[$1]++' Input.csv
Yes, that's the whole script:
$ cat Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc
$
$ awk -F, '!seen[$1]++' Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
40,06-10-2014,ghi
This should give you what you want:
awk -F, '{ if (!($1 in a)) a[$1] = $0; } END '{ for (i in a) print a[i]}' input.csv
typo there in syntax.
awk '{ if (!($1 in a)) a[$1] = $0; } END { for (i in a) print a[i]}'