Merging two files by ignoring a line with AWK - awk

I have two files:
file 1:
a 1 2 3
b 1 2 3
c 1 2 3
d 1 2 3
file 2:
hola
l m n o p q
Now I want to merge them in one single file by ignoring the header of file 2 like this:
a 1 2 3 l m n o p q
b 1 2 3
c 1 2 3
d 1 2 3
Does anyone have an idea how to do this?

Same expected output can be achieved without awk also
$ cat file1
a 1 2 3
b 1 2 3
c 1 2 3
d 1 2 3
$ cat file2
hola
l m n o p q
$ pr -mtJS' ' file1 <(tail -n +2 file2)
a 1 2 3 l m n o p q
b 1 2 3
c 1 2 3
d 1 2 3
$ paste -d ' ' file1 <(tail -n +2 file2)
a 1 2 3 l m n o p q
b 1 2 3
c 1 2 3
d 1 2 3

$ awk 'NR==FNR{if(NR>1)a[NR-1]=$0;next}{print $0,a[FNR]}' file2 file1
a 1 2 3 l m n o p q
b 1 2 3
c 1 2 3
d 1 2 3
Brief explanation,
NR==FNR{if(NR>1)a[NR-1]=$0;next}: in file2, omit the header and save from the second record to a[NR-1]. Note: this would also work as the lines in file2 grow up
print $0,a[FNR]: print the combination of the content of $0 in file1 and a[FNR]. FNR would be the record number in file1.

Related

Merging multiple files with null values using AWK

Sorry I am posting it again as i messed up in my earlier post:
I am interesting in joining multiple files (e.g., file 1 file2 file 3...) using matching values in column 1 and get this desired output. Would appreciate any help please:
file1:
A 2 3 4
B 3 7 8
C 4 6 9
file2:
A 7 6 3
C 2 4 7
D 1 6 4
file3:
A 3 2 7
B 4 7 3
E 3 6 8
Output:
A 2 3 4 7 6 3 3 2 7
B 3 7 8 n n n 4 7 3
C 4 6 9 2 4 7 n n n
D n n n 1 6 4 n n n
E n n n n n n 3 6 8
Here is one for awk. Tested with GNU awk, mawk, original-awk ie. awk 20121220 and Busybox awk:
$ awk '
function nn(c, b,i) {
if(c)
for(i=1;i<=c;i++)
b=b "n "
return b
}
FNR==1{nf+=(NF-1)}
{
for(i=2;i<=NF;i++)
b[$1]=b[$1] $i OFS
a[$1]=a[$1] (n[$1]<(nf-NF+1)?nn(nf-NF+1-n[$1]):"") b[$1]
n[$1]=nf+0
delete b[$1]
}
END{
for(i in a)
print i,a[i] (n[i]<(nf)?nn(nf-n[i]):"")
}' file1 file2 file3
Output:
A 2 3 4 7 6 3 3 2 7
B 3 7 8 n n n 4 7 3
C 4 6 9 2 4 7 n n n
D n n n 1 6 4 n n n
E n n n n n n 3 6 8

How to do intersection match between 2 DataFrames in Pandas?

Assume exists 2 DataFrames A and B like following
A:
a A
b B
c C
B:
1 2
3 4
How to produce C DataFrame like
a A 1 2
a A 3 4
b B 1 2
b B 3 4
c C 1 2
c C 3 4
Is there some function in Pandas can do this operation?
First all values has to be unique in each DataFrame.
I think you need product:
from itertools import product
A = pd.DataFrame({'a':list('abc')})
B = pd.DataFrame({'a':[1,2]})
C = pd.DataFrame(list(product(A['a'], B['a'])))
print (C)
0 1
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
Pandas pure solutions with MultiIndex.from_product:
mux = pd.MultiIndex.from_product([A['a'], B['a']])
C = pd.DataFrame(mux.values.tolist())
print (C)
0 1
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
C = mux.to_frame().reset_index(drop=True)
print (C)
0 1
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
Solution with cross join with merge and column filled by same scalars by assign:
df = pd.merge(A.assign(tmp=1), B.assign(tmp=1), on='tmp').drop('tmp', 1)
df.columns = ['a','b']
print (df)
a b
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
EDIT:
A = pd.DataFrame({'a':list('abc'), 'b':list('ABC')})
B = pd.DataFrame({'a':[1,3], 'c':[2,4]})
print (A)
a b
0 a A
1 b B
2 c C
print (B)
a c
0 1 2
1 3 4
C = pd.merge(A.assign(tmp=1), B.assign(tmp=1), on='tmp').drop('tmp', 1)
C.columns = list('abcd')
print (C)
a b c d
0 a A 1 2
1 a A 3 4
2 b B 1 2
3 b B 3 4
4 c C 1 2
5 c C 3 4

move a specific column down using awk

how can I move second column one row down as shown in the below example?
> input
n an na na
a ae 1 2 3
b be 3 2 1
c 4 4 4
> output
n na na
a an 1 2 3
b be 3 2 1
c be 4 4 4
this awk one-liner does the job for you:
awk '{t=$2;$2=p;p=t}7' file

Extracting Sequential Pattern

Can anyone help me on how to write a script to extract sequential lines?
I was able to find and get a script working to create all the permutations of the given inputs, but that's not what I need.
awk 'function perm(p,s, i) {
for(i=1;i<=n;i++)
if(p==1)
printf "%s%s\n",s,A[i]
else
perm(p-1,s A[i]", ")
}
{
A[++n]=$1
}
END{
perm(n)
}' infile
Unfortunately, I don't understand the script well enough to made a modification (not due to lack of trying).
I need to extract 2 to 5 sequential lines/word patterns.
An illustration of what I need is as follows:
Eg.
inputfile.txt:
A
B
C
D
E
F
G
outputfile.txt:
A B
B C
C D
D E
E F
F G
A B C
B C D
C D E
D E F
E F G
A B C D
B C D E
C D E F
D E F G
A B C D E
B C D E F
C D E F G
Here's a Python answer.
General algorithm:
Load all letters into a list
For n = 2..5, where n is size of the "window". You "slide" that window over the list and print those n characters.
Python is nice for this because of list slicing.
with open('input.txt') as f_in, open('output.txt', 'w') as f_out:
chars = f_in.read().splitlines()
for n in range(2, 6):
for start_window in range(len(chars) - n + 1):
f_out.write(' '.join(chars[start_window:start_window + n]))
f_out.write('\n')
awk to the rescue!
$ awk 'BEGIN{n=1}
FNR==1{n++}
{a[c++]=$0; c=c%n}
FNR>n-1{for(i=c;i<c+n-1;i++) printf "%s ",a[i%n];
print}' file{,,,}
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
multiple scans of the input file (number of commas). Used seq 9 as the input file.
Another in awk:
{ a[NR]=$0 }
END {
o[0]=ORS
for(i=2;i<=5;i++)
for(j=1;j<=length(a);j++) {
printf "%s%s", a[j], (++k==i?o[k=0]:OFS)
if(!k&&j!=length(a)) j-=(i-1)
}
}

selecting highest value in a table

A file having 3 columns with tab separated. I want to select the highest value from column 3rd (among the same name of column 1st) and print as the 4th column.
input file
A l 10
A l 2
A m 6
B l 12
B m 13
B n 7
C l 9
C l 8
C n 19
Output file
A l 10 10
A l 2 10
A m 6 10
B l 12 13
B m 13 13
B n 7 13
C l 9 19
C l 8 19
C n 19 19
Could you please suggest awk,or sed command. Thanks
You can use this awk
awk 'FNR==NR {arr[$1]=arr[$1]>$3?arr[$1]:$3;next} {print $0,arr[$1]}' OFS="\t" file{,}
A l 10 10
A l 2 10
A m 6 10
B l 12 13
B m 13 13
B n 7 13
C l 9 19
C l 8 19
C n 19 19
This passes two times over the file. First time to find highest, next to print.
The file{,} make the filename double. You can also use file file instead.