If I have:
1 2 3 4 5 6 . .
3 4 5 4 2 1 . .
5 7 5 7 2 0 . .
.
.
I want to show the difference of adjacent data rows, so that it would show:
2 2 2 0 -3 -5 . .
2 3 0 3 0 -1 . .
.
.
I found the post difference between number in the same column using AWK, and adapting the second answer, I thought that this will do the job:
awk 'NR>1{print $0-p} {p=$0}' file
But that produces output in and of a single column. How do I get it to retain the column structure of the data?
$ cat tst.awk
NR>1 {
for (i=1; i<=NF; i++) {
printf "%2d%s", $i - p[i], (i<NF ? OFS : ORS)
}
}
{ split($0,p) }
$ awk -f tst.awk file
2 2 2 0 -3 -5
2 3 0 3 0 -1
Try something like this:
awk '{for (i=1; i <= NF; i++) { c[i] = $i - c[i] }; count = NF }
END { for (i = 1; i <= count; i++) { printf c[i] " "}}' numbers
Written out:
$ cat > subtr.awk
{
for (i=1; i<=NF; i++) b[i]=a[i]
# for (i in a) b[i]=a[i]
n=split($0,a)
}
NR > 1 {
for (i=1; i<=NF; i++) {
#for(i in a) {
printf "%s%s", a[i]-b[i], (i==n?ORS:OFS)
}
delete b
}
Test it:
$ awk -f subtr.awk file
2 2 2 0 -3 -5
2 3 0 3 0 -1
Related
A text file containing multiple tabular delimited columns between strings with an example below.
Code 1 (3)
5 10 7 1 1
6 10 9 1 1
7 10 10 1 1
Code 2 (2)
9 11 3 1 3
10 8 5 2 1
Code 3 (1)
12 10 2 1 1
Code 4 (2)
14 8 1 1 3
15 8 7 5 1
I would like to average the numbers in the third column for each code block. The example below is what the output should look like.
8.67
4
2
4
Attempt 1
awk '$3~/^[[:digit:]]/ {i++; sum+=$3; print $3} $3!~/[[:digit:]]/ {print sum/i; sum=0;i=0}' in.txt
Returned fatal: division by zero attempted.
Attempt 2
awk -v OFS='\t' '/^Code/ { if (NR > 1) {i++; sum+=$3;} {print sum/i;}}' in.txt
Returned another division by zero error.
Attempt 3
awk -v OFS='\t' '/^Code/ { if (NR > 1) { print s/i; s=0; i=0; } else { s += $3; i += 1; }}' in.txt
Returned 1 value: 0.
Attempt 4
awk -v OFS='\t' '/^Code/ {
if (NR > 1)
i++
print sum += $3/i
}
END {
i++
print sum += $3/i
}'
Returned:
0
0
0
0.3
I am not sure where that last number is coming from, but this has been the closest solution so far. I am getting a number for each block, but not the average.
Could you please try following.
awk '
/^Code/{
if(value!=0 && value){
print sum/value
}
sum=value=""
next
}
{
sum+=$NF;
value++
}
END{
if(value!=0 && value){
print sum/value
}
}
' Input_file
I would like to replace column values of ref using key value pairs from id
cat id:
[1] a 8-23
[2] g 8-21
[3] d 8-13
cat ref:
a 1 2
b 3 4
c 5 3
d 1 2
e 3 1
f 1 2
g 2 3
desired output
8-23 1 2
b 3 4
c 5 3
8-13 1 2
e 3 1
f 1 2
8-21 2 3
I assume it would be best done using awk.
cat replace.awk
BEGIN { OFS="t" }
NR==FNR {
a[$2]=$3; next
}
$1 in !{!a[#]} {
print $0
}
Not sure what I need to change?
$1 in !{!a[#]} is not awk syntax. You just need $1 in a:
BEGIN { OFS='\t' }
NR==FNR {
a[$2] = $3
next
}
{
$1 = ($1 in a) ? a[$1] : $1
print
}
to force OFS to update, this version always assigns to $1
print uses $0 if unspecified
I have data which looks like this
1 3
1 2
1 9
5 4
4 6
5 6
5 8
5 9
4 2
I would like the output to be
1 3,2,9
5 4,6,8,9
4 6,2
This is just sample data but my original one has lots more values.
So this worked
So this basically creates a hash table, using the first column as a key and the second column of the line as the value:
awk '{line="";for (i = 2; i <= NF; i++) line = line $i ", "; table[$1]=table[$1] line;} END {for (key in table) print key " => " table[key];}' trial.txt
OUTPUT
4 => 6, 2
5 => 4, 6, 8, 9
1 => 3, 2, 9
I'd write
awk -v OFS=, '
{
key = $1
$1 = ""
values[key] = values[key] $0
}
END {
for (key in values) {
sub(/^,/, "", values[key])
print key " " values[key]
}
}
' file
If you want only the unique values for each key (requires GNU awk for multi-dimensional arrays)
gawk -v OFS=, '
{ for (i=2; i<=NF; i++) values[$1][$i] = i }
END {
for (key in values) {
printf "%s ", key
sep = ""
for (val in values[key]) {
printf "%s%s", sep, val
sep = ","
}
print ""
}
}
' file
or perl
perl -lane '
$key = shift #F;
$values{$key}{$_} = 1 for #F;
} END {
$, = " ";
print $_, join(",", keys %{$values{$_}}) for keys %values;
' file
if not concerned with the order of the keys, I think this is the idiomatic awk solution.
$ awk '{a[$1]=($1 in a?a[$1]",":"") $2}
END{for(k in a) print k,a[k]}' file |
column -t
4 6,2
5 4,6,8,9
1 3,2,9
the data is something like
"1||2""3""2||3""5""4||3""6""43""4||4||3""4||3", 43 ,"4||3""43""3||4||4||3"
i've tried this myself
BEGIN {
FPAT = "(\"[^\"]+\")|([ ])"
}
{
print "NF = ", NF
for (i = 1; i <= NF; i++) {
printf("$%d = <%s>\n", i, $i)
}
}
but the problem is it's giving me an output like
$ gawk -f prog4.awk data1.txt
NF = 18
$1 = <"1||2">
$2 = <"3">
$3 = <"2||3">
$4 = <"5">
$5 = <"4||3">
$6 = <"6">
$7 = <"43">
$8 = <"4||4||3">
$9 = <"4||3">
$10 = <,>
$11 = < >
$12 = <4>
$13 = <3>
$14 = < >
$15 = <,>
$16 = <"4||3">
$17 = <"43">
$18 = <"3||4||4||3">
>
as you can see $10 to $15 each and every character is taken. help appreciated.
Let's try approaching this a different way - if the following is not what you are looking for, please tell us in what way(s) it differs from your desired output and why:
$ cat tst.awk
BEGIN { FPAT="\"[^\"]+\"" }
{
for (i=1; i<=NF; i++) {
print i, "<" $i ">"
}
}
$
$ gawk -f tst.awk file
1 <"1||2">
2 <"3">
3 <"2||3">
4 <"5">
5 <"4||3">
6 <"6">
7 <"43">
8 <"4||4||3">
9 <"4||3">
10 <"4||3">
11 <"43">
12 <"3||4||4||3">
I have a file with an variable number of columns:
Input:
1 1 2
2 1 5
5 2 3
7 0 -1
4 1 4
I want to print the max and min of each column:
Desired output:
max: 7 2 5
min: 1 0 -1
For a single column, e.g. $1, I know I can find the max and min using something like:
awk '{if(min==""){min=max=$1}; if($1>max) {max=$1}; if($1<min) {min=$1};} END {printf "%.2g %.2g\n", min, max}'
Question
How can I extend this to loop over all columns (not necessarily just the 3 in my example)?
Many thanks!
awk 'NR==1{for(i=1;i<=NF;i++)min[i]=max[i]=$i;}
{for(i=1;i<=NF;i++){if($i<min[i]){min[i]=$i}else if($i>max[i])max[i]=$i;}}
END{printf "max:\t"; for(i in max) printf "%d ",max[i]; printf "\nmin:\t"; for(i in min)printf "%d ",min[i];}' input.txt
input.txt:
1 1 2 2
2 1 5 3
5 2 3 10
7 0 -1 0
4 1 4 5
output:
max: 7 2 5 10
min: 1 0 -1 0
Like this
awk 'NR==1{for(i=1;i<=NF;i++){xmin[i]=$i;xmax[i]=$i}}
{for(i=1;i<=NF;i++){if($i<xmin[i])xmin[i]=$i;if($i>xmax[i])xmax[i]=$i}}
END{for(i=1;i<=NF;i++)print xmin[i],xmax[i]}' file
Let's try to make it a bit shorter by using the min=(current<min?current:min) expression. This is a ternary operator that is the same as saying if (current<min) min=current.
Also, printf "%.2g%s", min[i], (i==NF?"\n":" ") prints the new line on the END{} block whenever it reaches the last field.
awk 'NR==1{for (i=1; i<=NF; i++) {min[i]=$i}; next}
{for (i=1; i<=NF; i++) { min[i]=(min[i]>$i?$i:min[i]); max[i]=(max[i]<$i?$i:max[i]) }}
END {printf "min: "; for (i=1;i<=NF;i++) printf "%.2g%s", min[i], (i==NF?"\n":" ");
printf "max: "; for (i=1;i<=NF;i++) printf "%.2g%s", max[i], (i==NF?"\n":" ")}' file
Sample output:
$ awk 'NR==1{for (i=1; i<=NF; i++) {min[i]=$i}; next} {for (i=1; i<=NF; i++) { min[i]=(min[i]>$i?$i:min[i]); max[i]=(max[i]<$i?$i:max[i]) }} END {printf "min: "; for (i=1;i<=NF;i++) printf "%.2g%s", min[i], (i==NF?"\n":" "); printf "max: "; for (i=1;i<=NF;i++) printf "%.2g%s", max[i], (i==NF?"\n":" ")}' file
min: 1 0 -1
max: 7 2 5