If specific columns match, add them to the same row using BASH - awk

I have data that looks like this:
client1 5 10 12 17
client1 6 8 3 20
client1 3 2 2 2
client2 3 3 3 3
client2 4 4 0 0
client2 0 3 3 9
...
client100 3 3 2 1
client100 1 1 1 2
client100 3 3 4 4
I want to make it so there is only one row for every client, with all the information of each client row, merged into one. So for example, client1 and client2 would look like this merged (but obviously I need all the clients merged.)
client1 5 10 12 17 6 8 3 20 3 2 2 2
client2 3 3 3 3 4 4 0 0 0 3 3 9
awk '{ x[$1]=x[$1] " " $2; y[$2]=y[$2] " " $1; }
END {
for (k in x) print k,x[k] >"OUTPUT1";
for (k in y) print k,y[k] >"OUTPUT2";
}' INPUT

When you're merging, you need to add all the fields, not just field 2. The easiest way to do this is to empty field 1, then append the whole record to the array entry.
awk '{ client = $1; $1 = ""; x[client] = x[client] $0 }
END { for (k in x) print k x[k] }' INPUT
I'm not sure what your array y was for. There doesn't seem to be any reason for an array that uses the second field as the keys.

Could you please try following. This should provide output in same order in which $1 has occurred in Input_file.
awk '
{
gsub(/\r/,"")
}
!a[$1]++{
b[++count]=$1
}
{
val=$1
$1=""
sub(/^ +/,"")
c[val]=(c[val]?c[val] OFS:"")$0
}
END{
for(i=1;i<=count;i++){
print b[i],c[b[i]]
}
}
' Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program.
{
gsub(/\r/,"")
}
!a[$1]++{ ##Checking condition if $1 is NOT in array a then do following.
b[++count]=$1 ##Creating array b with index count and value is $1.
}
{
val=$1 ##Creating a variable val whose value is $1.
$1="" ##Nullifying $1 here.
sub(/^ +/,"") ##Substituting initial space with null here.
c[val]=(c[val]?c[val] OFS:"")$0 ##Creating an array c whose index is variable val and value is complete line value and its concatenating its own value each time cursor comes here.
}
END{
for(i=1;i<=count;i++){ ##Starting a for loop from i=1 till value of count here.
print b[i],c[b[i]] ##Printing value of array b with index i and array c with index of b[i].
}
}
' Input_file ##Mentioning Input_fie name here.

Related

awk Sum column from multi-files and put the result in one file

f1
1 5
2 6
3 10
f2
1 8
2 12
3 10
f3
1 6
2 9
3 20
i want sum column 2 in
f1, f2, f3
and put the result in f4
1 21 30 35
i used this command
awk '{sum+=$1;}END{print sum;}' f1>f4
how can add sum other files
awk '{ f[FILENAME]+=$2; }
END{ for (i in f) { h=sprintf("%s%s%s",h,sep,i);
d=sprintf("%s%s%s",d,sep,f[i]); sep="," };
print h; print d}' f1 f2 f3 > f4
output:
$ cat f4
f1,f2,f3
21,30,35
The variable FILENAME has the value of the filename that is currently read (for input).
Could you please try following.
awk '
FNR==1{
if(sum){
total_sum=(total_sum?total_sum",":"")sum
}
sum=""
header=(header?header",":"")FILENAME
}
{
sum+=$2
}
END{
print header ORS total_sum","sum
}
' file* > file4
Explanation: Adding explanation for above code.
awk ' ##Starting awk program from here.
FNR==1{ ##checking condition FNR==1 which will be TRUE when first line being read.
if(sum){ ##Checking condition if variable sum is NOT NULL then do following.
total_sum=(total_sum?total_sum",":"")sum ##Creating variable total_sum whose value is keep concatenating variable sum value in it.
} ##Closing BLOCK for if condition here.
sum="" ##Nullifying variable sum here.
header=(header?header",":"")FILENAME ##Creating a variable header whose value is keep concatenating awk variable FILENAME value each time cursor here.
} ##Closing BLOCK for FNR==1 condition here.
{
sum+=$2 ##Creating variable sum whose value is keep adding to $2 value of each Input_file when Input_file is being read.
}
END{ ##Starting END section of this awk code from here.
print header ORS total_sum","sum ##Printing variable header ORS(New line) variable total_sum comma and sum value then.
}
' file* ##Mentioning Input_file names here.
Output will be as follows.
file1,file2,file3
21,30,35

Stored each of the first 2 blocks of lines in arrays

I've sorted it by using Google Sheet, but its gonna takes a long time, so I figured it out, to settle it down by awk.
input.txt
Column 1
2
2
2
4
4
Column 2
562
564
119
215
12
Range
13455,13457
13161
11409
13285,13277-13269
11409
I've tried this script, so it's gonna rearrange the value.
awk '/Column 1/' RS= input.txt
(as referred in How can I set the grep after context to be "until the next blank line"?)
But it seems, it's only gonna take one matched line
It should be sorted by respective lines.
Result:
562Value2#13455
562Value2#13457
564Value2#13161
119Value2#11409
215Value4#13285
215Value4#13277-13269
12Value4#11409
it should be something like that, the "comma" will be repeating the value from Column 1 and Column 2
etc:
Range :
13455,13457
Result :
562Value2#13455
562Value2#13457
idk what sorting has to do with it but it seems like this is what you're looking for:
$ cat tst.awk
BEGIN { FS=","; recNr=1; print "Result:" }
!NF { ++recNr; lineNr=0; next }
{ ++lineNr }
lineNr == 1 { next }
recNr == 1 { a[lineNr] = $0 }
recNr == 2 { b[lineNr] = $0 }
recNr == 3 {
for (i=1; i<=NF; i++) {
print b[lineNr] "Value" a[lineNr] "#" $i
}
}
$ awk -f tst.awk input.txt
Result:
562Value2#13455
562Value2#13457
564Value2#13161
119Value2#11409
215Value4#13285
215Value4#13277-13269
12Value4#11409

compare multiple columns and only replace if matching

I have two files (File 1 and File 2)
I am trying to compare the string of Column1 and 2 of File1 with Column4 and 5 of File2. Except this match, column6 of File2 also need to match certain string, like SO or CO (because column3 and 4 of FILE1 is SO and CO respectively), then replace of column7 of FILE2 with column3 of FILE1, otherwise keep the others unchanged.
I tried to modify and use the solution provided in the forum for a similar problem, but did not work.
FILE1
type code SO CO other
7757 1 6941.958 138.922 149.17
7757 2 8666.123 198.908 225.67
7757 4 2795.885 334.875 378.68
7759 GT3 222.104 13.5 734.62
7768 CT2 0 0 0
7805 6 3796.677 75.175 79.09
FILE2
"US","01073",,"7757","1","SO","10","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","10","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","10","299"
Required output:
"US","01073",,"7757","1","SO","6941.958","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","138.922","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","75.175","299"
Solution I tried (for CO only) :
tr -d '"' < FILE2 > temp # to remove double quote
awk 'NR==FNR{A[$1,$2]=$3;next} A[$4,$5] && $6=="CO" {$7=A[$1,$2]; print}' FS=" " OFS="," FILE1 temp > out
Complex awk solution:
awk 'function unquote(f){
return substr(f, 2, length(f)-2)
}
NR==FNR{
if (NR==1){ f3=$3; f4=$4 }
else if (NF){ a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 }
next;
}
{ k=unquote($4) SUBSEP unquote($5) SUBSEP unquote($6) }
k in a{ $7=a[k] }1' file1 FS=',' OFS=',' file2
function unquote(f) { ... } - unquotes/extracts value between double quotes (in fact - between the 1st and last characters of the string)
a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 - grouping crucial sequences
The output:
"US","01073",,"7757","1","SO",6941.958,"299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO",138.922,"299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO",75.175,"299"

How to detect the last line in awk before END?

I'm trying to concatenate String values and print them, but if the last types are Strings and there is no change of type then the concatenation won't print:
input.txt:
String 1
String 2
Number 5
Number 2
String 3
String 3
awk:
awk '
BEGIN { tot=0; ant_t=""; }
{
t = $1; val=$2;
#if string, concatenate its value
if (t == "String") {
tot+=val;
nx=1;
} else {
nx=0;
}
#if type change, add tot to res
if (t != "String" && ant_t == "String") {
res=res tot;
tot=0;
}
ant_t=t;
#if string, go next
if (nx == 1) {
next;
}
res=res"\n"val;
}
END { print res; }' input.txt
Current output:
3
5
2
Expected output:
3
5
2
6
How can I detect if awk is reading last line, so if there won't be change of type it will check if it is the last line?
awk reads line by line hence it cannot determine if it is reading the last line or not. The END block can be useful to perform actions once the end of file has reached.
To perform what you expect
awk '/String/{sum+=$2} /Number/{if(sum) print sum; sum=0; print $2} END{if(sum) print sum}'
will produce output as
3
5
2
6
what it does?
/String/ selects line that matches String so is Number
sum+=$2 performs the concatanation with String lines. When Number occurs, print the sum and reset to zero
Like this maybe:
awk -v lines="$(wc -l < /etc/hosts)" 'NR==lines{print "LAST"};1' /etc/hosts
I am pre-calculating the number of lines (using wc) and passing that into awk as a variable called lines, if that is unclear.
Just change last line to:
END { print res; print tot;}'
awk '$1~"String"{x+=$2;y=1}$1~"Number"{if (y){print x;x=0;y=0;}print $2}END{if(y) print x}' file
Explanation
y is used as a boolean, and I check at the END if the last pattern was a string and print the sum
You can actually use x as the boolean like nu11p01n73R does which is smarter
Test
$ cat file
String 1
String 2
Number 5
Number 2
String 3
String 3
$ awk '$1~"String"{x+=$2;y=1}$1~"Number"{if (y){print x;x=0;y=0;}print $2}END{if(y) print x}' file
3
5
2
6

Awk count frequency

Hey i want to count the amount of data in a certain column in awk.
an example dataset is
2 5 8
1 3 7
8 5 9
and I want to count the frequency of the 5 in the second colum. This is what i tried that didn't work
{
total = 0;
for(i=1;i<=NF;i++)
{
if(i==2)
{if($i==5) {total++;}
}
printf("%s ", total);
}
}
How about the following:
awk '{ if ($2==5) count++ } END { print count }'
awk 'NR == 1 {ind = 0} $2 == 5 {ind++} END {print ind}' testdata.txt