Merging multiple files with null values using AWK - awk

Sorry I am posting it again as i messed up in my earlier post:
I am interesting in joining multiple files (e.g., file 1 file2 file 3...) using matching values in column 1 and get this desired output. Would appreciate any help please:
file1:
A 2 3 4
B 3 7 8
C 4 6 9
file2:
A 7 6 3
C 2 4 7
D 1 6 4
file3:
A 3 2 7
B 4 7 3
E 3 6 8
Output:
A 2 3 4 7 6 3 3 2 7
B 3 7 8 n n n 4 7 3
C 4 6 9 2 4 7 n n n
D n n n 1 6 4 n n n
E n n n n n n 3 6 8

Here is one for awk. Tested with GNU awk, mawk, original-awk ie. awk 20121220 and Busybox awk:
$ awk '
function nn(c, b,i) {
if(c)
for(i=1;i<=c;i++)
b=b "n "
return b
}
FNR==1{nf+=(NF-1)}
{
for(i=2;i<=NF;i++)
b[$1]=b[$1] $i OFS
a[$1]=a[$1] (n[$1]<(nf-NF+1)?nn(nf-NF+1-n[$1]):"") b[$1]
n[$1]=nf+0
delete b[$1]
}
END{
for(i in a)
print i,a[i] (n[i]<(nf)?nn(nf-n[i]):"")
}' file1 file2 file3
Output:
A 2 3 4 7 6 3 3 2 7
B 3 7 8 n n n 4 7 3
C 4 6 9 2 4 7 n n n
D n n n 1 6 4 n n n
E n n n n n n 3 6 8

Related

awk cumulative sum in on dimension

Good afternoon,
I would like to make a cumulative sum for each column and line in awk.
My in file is :
1 2 3 4
2 5 6 7
2 3 6 5
1 2 1 2
And I would like : per column
1 2 3 4
3 7 9 11
5 10 15 16
6 12 16 18
6 12 16 18
And I would like : per line
1 3 5 9 9
2 7 13 20 20
2 5 11 16 16
1 3 4 6 6
I did the sum per column as :
awk '{ for (i=1; i<=NF; ++i) sum[i] += $i}; END { for (i in sum) printf "%s ", sum[i]; printf "\n"; }' test.txt # sum
And per line .
awk '
BEGIN {FS=OFS=" "}
{
sum=0; n=0
for(i=1;i<=NF;i++)
{sum+=$i; ++n}
print $0,"sum:"sum,"count:"n,"avg:"sum/n
}' test.txt
But I would like to print all the lines and columns.
Do you have an idea?
It looks like you have all the correct information available, all you are missing is the printout statements.
Is this what you are looking for?
accumulated sum of the columns:
% cat foo
1 2 3 4
2 5 6 7
2 3 6 5
1 2 1 2
% awk '{ for (i=1; i<=NF; ++i) {sum[i]+=$i; $i=sum[i] }; print $0}' foo
1 2 3 4
3 7 9 11
5 10 15 16
6 12 16 18
accumulated sum of the rows:
% cat foo
1 2 3 4
2 5 6 7
2 3 6 5
1 2 1 2
% awk '{ sum=0; for (i=1; i<=NF; ++i) {sum+=$i; $i=sum }; print $0}' foo
1 3 6 10
2 7 13 20
2 5 11 16
1 3 4 6
Both these make use of the following :
each variable has value 0 by default (if used numerically)
I replace the field $i with what the sum value
I reprint the full line with print $0
row sums with repeated last element
$ awk '{s=0; for(i=1;i<=NF;i++) $i=s+=$i; $i=s}1' file
1 3 6 10 10
2 7 13 20 20
2 5 11 16 16
1 3 4 6 6
$i=s sets the index value (now incremented to NF+1) to the sum and 1 prints the line with that extra field.
columns sums with repeated last row
$ awk '{for(i=1;i<=NF;i++) c[i]=$i+=c[i]}1; END{print}' file
1 2 3 4
3 7 9 11
5 10 15 16
6 12 16 18
6 12 16 18
END{print} repeats the last row
ps. your math seems to be wrong for the row sums

Merging two files by ignoring a line with AWK

I have two files:
file 1:
a 1 2 3
b 1 2 3
c 1 2 3
d 1 2 3
file 2:
hola
l m n o p q
Now I want to merge them in one single file by ignoring the header of file 2 like this:
a 1 2 3 l m n o p q
b 1 2 3
c 1 2 3
d 1 2 3
Does anyone have an idea how to do this?
Same expected output can be achieved without awk also
$ cat file1
a 1 2 3
b 1 2 3
c 1 2 3
d 1 2 3
$ cat file2
hola
l m n o p q
$ pr -mtJS' ' file1 <(tail -n +2 file2)
a 1 2 3 l m n o p q
b 1 2 3
c 1 2 3
d 1 2 3
$ paste -d ' ' file1 <(tail -n +2 file2)
a 1 2 3 l m n o p q
b 1 2 3
c 1 2 3
d 1 2 3
$ awk 'NR==FNR{if(NR>1)a[NR-1]=$0;next}{print $0,a[FNR]}' file2 file1
a 1 2 3 l m n o p q
b 1 2 3
c 1 2 3
d 1 2 3
Brief explanation,
NR==FNR{if(NR>1)a[NR-1]=$0;next}: in file2, omit the header and save from the second record to a[NR-1]. Note: this would also work as the lines in file2 grow up
print $0,a[FNR]: print the combination of the content of $0 in file1 and a[FNR]. FNR would be the record number in file1.

move a specific column down using awk

how can I move second column one row down as shown in the below example?
> input
n an na na
a ae 1 2 3
b be 3 2 1
c 4 4 4
> output
n na na
a an 1 2 3
b be 3 2 1
c be 4 4 4
this awk one-liner does the job for you:
awk '{t=$2;$2=p;p=t}7' file

Extracting Sequential Pattern

Can anyone help me on how to write a script to extract sequential lines?
I was able to find and get a script working to create all the permutations of the given inputs, but that's not what I need.
awk 'function perm(p,s, i) {
for(i=1;i<=n;i++)
if(p==1)
printf "%s%s\n",s,A[i]
else
perm(p-1,s A[i]", ")
}
{
A[++n]=$1
}
END{
perm(n)
}' infile
Unfortunately, I don't understand the script well enough to made a modification (not due to lack of trying).
I need to extract 2 to 5 sequential lines/word patterns.
An illustration of what I need is as follows:
Eg.
inputfile.txt:
A
B
C
D
E
F
G
outputfile.txt:
A B
B C
C D
D E
E F
F G
A B C
B C D
C D E
D E F
E F G
A B C D
B C D E
C D E F
D E F G
A B C D E
B C D E F
C D E F G
Here's a Python answer.
General algorithm:
Load all letters into a list
For n = 2..5, where n is size of the "window". You "slide" that window over the list and print those n characters.
Python is nice for this because of list slicing.
with open('input.txt') as f_in, open('output.txt', 'w') as f_out:
chars = f_in.read().splitlines()
for n in range(2, 6):
for start_window in range(len(chars) - n + 1):
f_out.write(' '.join(chars[start_window:start_window + n]))
f_out.write('\n')
awk to the rescue!
$ awk 'BEGIN{n=1}
FNR==1{n++}
{a[c++]=$0; c=c%n}
FNR>n-1{for(i=c;i<c+n-1;i++) printf "%s ",a[i%n];
print}' file{,,,}
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
multiple scans of the input file (number of commas). Used seq 9 as the input file.
Another in awk:
{ a[NR]=$0 }
END {
o[0]=ORS
for(i=2;i<=5;i++)
for(j=1;j<=length(a);j++) {
printf "%s%s", a[j], (++k==i?o[k=0]:OFS)
if(!k&&j!=length(a)) j-=(i-1)
}
}

AWK - removal of the same fields on the basis of the "$1"

I have a file1:
6
3
6
9
2
6
This command prints the result:
awk 'NR==1{a=$1};$0!=a' file1
3
9
2
Now I have file2:
6 1 2 3 4 5
3 3 4 4 4 6
6 5 2 2 5 1
9 1 3 5 4 1
2 5 6 4 8 5
6 1 5 2 3 1
I want to do the same thing, but with file2. I want to print out the result:
3 3 4 4 5 6
9 5 3 2 8 1
2 5 6 5 3 1
5 4 1
2
I want to do it in awk. Thank you for your help.
AWK is not really suited for what you are trying to do, since it is made for processing rows one at a time, while you are trying to shift numbers up and down between different rows. That said, this monster should do what you want:
awk 'NR==1{nc=NF;for(i=1;i<=nc;i++)a[i]=$i}{for(i=1;i<=nc;i++){if($i!=a[i]){v[m[i]++,i]=$i;if(m[i]>nl)nl=m[i]}}}END{for(l=0;l<nl;l++){for(i=1;i<=nc;i++){if(l<m[i]){printf("%d ", v[l,i])}else{printf(" ")}}printf("\n")}}'
If, on the other hand, your matrix of numbers had been transposed, this task would have been far simpler:
awk '{for(i=2;i<=NF;i++)if($i!=$1)printf(" %d",$i);printf("\n")}'