Clubiing two lines into one using AWK - awk

I have a file with following format
Time Number Val
x 1 y
x 1 y
a 1 z
b 1 m
b 2 m
I want to club lines with same value, the final file should be something like this
Time Number Val
x 2 y
a 1 z
b 3 m
How to do this using awk?

You can use awk's associative array:
awk 'NR==1{print $0} NR!=1{a[$1]+=$2; b[$1]=$3;} \
END{ for ( i in a) print i, a[i], b[i]}' file
For your sample input, it prints:
Time Number Val
x 2 y
a 1 z
b 3 m

Count all duplicate Time and Val combinations:
awk 'NR>1{a[$1,$3]+=$2;next}$1=$1;END{for(k in a){split(k,s,SUBSEP);print s[1],a[k],s[2]}}' OFS="\t" file
Time Number Val
a 1 z
b 3 m
x 2 y

Related

For each unique occurrence in field, print sum corresponding numerical field and number of occurrences/counts

I have a file
a x 0 3
a x 0 1
b,c x 4 4
dd x 3 5
dd x 2 5
e,e,t x 5 7
a b 1 9
cc b 2 1
cc b 1 1
e,e,t b 1 2
e,e,t b 1 2
e,e,t b 1 2
for each element in $1$2, I want print the sum $3, $4 and the number of occurrences/lenght/counts
So that I have
a x 0 4 0 2
b,c x 4 4 1 1
dd x 5 10 2 2
e,e,t x 5 7 1 1
a b 1 9 1 1
cc b 3 2 2 2
e,e,t b 3 6 3 3
I am using
awk -F"\t" '{for(n=2;n<=NF; ++n) a[$1 OFS $2][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=3; n<=4; ++n) printf "\t%s", a[i][n], a[i][n]++
printf "\n" }}' file
but it's only printing the sums, not the counts
The actual file has many columns: the keys are $4$6$7$8 and the numerical columns are $9-$13
You may use this awk:
cat sum.awk
BEGIN {
FS = OFS = "\t" # set input/output FS to tab
}
{
k = $1 OFS $2 # create key using $1 tab $2
if (!(k in map3)) # if k is not in map3 save it in an ordered array
ord[++n] = k
map3[k] += $3 # sum of $3 in array map3 using key as k
$3 > 0 && ++fq3[k] # frequency of $3 if it is > 0
map4[k] += $4 # sum of $4 in array map4 using key as k
$4 > 0 && ++fq4[k] # frequency of $4 if it is > 0
}
END {
for(i=1; i<=n; ++i) # print everything by looping through ord array
print ord[i], map3[ord[i]], map4[ord[i]], fq3[ord[i]]+0, fq4[ord[i]]+0
}
Then use it as:
awk -f sum.awk file
a x 0 4 0 2
b,c x 4 4 1 1
dd x 5 10 2 2
e,e,t x 5 7 1 1
a b 1 9 1 1
cc b 3 2 2 2
e,e,t b 3 6 3 3

Count number of occurrences of a number larger than x from every raw

I have a file with multiple rows and 26 columns. I want to count the number of occurrences of values that are higher than 0 (I guess is also valid different from 0) in each row (excluding the first two columns). The file looks like this:
X Y Sample1 Sample2 Sample3 .... Sample24
a a1 0 7 0 0
b a2 2 8 0 0
c a3 0 3 15 3
d d3 0 0 0 0
I would like to have an output file like this:
X Y Result
a a1 1
b b1 2
c c1 3
d d1 0
awk or sed would be good.
I saw a similar question but in that case the columns were summed and the desired output was different.
awk 'NR==1{printf "X\tY\tResult%s",ORS} # Printing the header
NR>1{
count=0; # Initializing count for each row to zero
for(i=3;i<=NF;i++){ #iterating from field 3 to end, NF is #fields
if($i>0){ #$i expands to $3,$4 and so which are the fields
count++; # Incrementing if the condition is true.
}
};
printf "%s\t%s\t%s%s",$1,$2,count,ORS # For each row print o/p
}' file
should do that
another awk
$ awk '{if(NR==1) c="Result";
else for(i=3;i<=NF;i++) c+=($i>0);
print $1,$2,c; c=0}' file | column -t
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0
$ awk '{print $1, $2, (NR>1 ? gsub(/ [1-9]/,"") : "Result")}' file
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0

matching rows and fields from two files

I would like to match the record number in one file will the same field number in another file:
file1:
1
3
5
4
3
1
5
file2:
A B C D E F G
H I J J K L M
N O P Q R S T
I would like to use the record numbers corresponding to 5 in the first file to obtain the corresponding fields in the second file. Desired output:
C G
J M
P T
So far, I've done:
awk '{ if ($1=="5") print NR }' file1 > temp
for i in $(cat temp); do
awk '{ print $"'${i}'" }' file2
done
But get the output:
C
J
P
G
M
T
I would like to have this in the format of the desired output above, but can't get it to work. Perhaps using prinf or awk for-loop might work, but I have had no success.
Thank you all.
awk 'NR==FNR{if($1==5)a[NR];next}{for(i in a){printf $i" "}print ""}' a b
C G
J M
P T

How to extract data at a specific location from a file containing a grid of data points

I have a file containing a 3D grid (x, y, time), with a property "v" at each grid point. I want to extract the time profile of "v" at a particular x, y point, or more specifically, at the x, y point closest to my desired location (it is unlikely that the desired location will exactly fall on a grid point). Is there an easy awk script for this when the file is in either ascii or binary format?
Example of file format
X Y Time V
1 1 0 2
1 1 10 3
1 1 20 4
1 2 0 3
1 2 10 8
1 2 20 11
1 3 0 3
Example of desired output if location of interest is x=0.9, y=2.1
1 2 0 3
1 2 10 8
1 2 20 11
$ cat tst.awk
function abs(val) { return (val < 0 ? -val : val) }
BEGIN { ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
NR==FNR {
if (NR>1) {
dist[NR] = abs(x - $1) + abs(y - $2)
min = (NR==2 || dist[NR]<min ? dist[NR] : min)
}
next
}
FNR==1 || dist[FNR] == min
$ awk -v x=0.9 -v y=2.1 -f tst.awk file
X Y Time V
1 2 0 3
1 2 10 8
1 2 20 11
Just check that the algorithm to calculate dist[] is what you need and tweak it to suit otherwise.

awk: delete first and last entry of comma-separated field

I have a 4 column data that looks something like the following:
a 1 g 1,2,3,4,5,6,7
b 2 g 3,5,3,2,6,4,3,2
c 3 g 5,2,6,3,4
d 4 g 1,5,3,6,4,7
I am trying to delete first two numbers and the last two numbers on entire fourth column so the output looks like the following
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6
Can someone give me a help? I would appreciate it.
You can use this:
$ awk '{n=split($4, a, ","); for (i=3; i<=n-2; i++) t=t""a[i](i==n-2?"":","); print $1, $2, $3, t; t=""}' file
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6
Explanation
n=split($4, a, ",") slices the 4th field in pieces, based on comma as delimiter. As split() returns the number of pieces we got, we store it in n to work with it later on.
for (i=3; i<=n-2; i++) t=t""a[i](i==n-2?"":",") stores in t the last field, looping through all the slices.
print $1, $2, $3, t; t="" prints the new output and blanks the variable t.
This will work for your posted sample input:
$ awk '{gsub(/^([^,]+,){2}|(,[^,]+){2}$/,"",$NF)}1' file
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6
If you have cases where there's less than 4 commas in your 4th field then update your question to show how those should be handled.
This uses bash array manipulation. It may be a little ... gnarly:
while read -a fields; do # read the fields for each line
IFS=, read -a values <<< "${fields[3]}" # split the last field on comma
new=("${values[#]:2:${#values[#]}-4}") # drop the first 2 and last fields
fields[3]=$(IFS=,; echo "${new[*]}") # join the new list on comma
printf "%s\t" "${fields[#]}"; echo # print the new line
done <<END
a 1 g 1,2,3,4,5,6,7
b 2 g 3,5,3,2,6,4,3,2
c 3 g 5,2,6,3,4
d 4 g 1,5,3,6,4,7
END
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6