Concatenate/merge columns - awk

File( ~50,000 columns)
A1 2 123 f f j j k k
A2 10 789 f o p f m n
Output
A1 2 123 ff jj kk
A2 10 789 fo pf mn
I basically want to concatenate every two columns into one starting from column4. How can we do it in awk or sed?

It is possible in awk. See below
:~/t> more test.txt
A1 2 123 f f j j k k
:~/t> awk '{for(i=j=4; i < NF; i+=2) {$j = $i$(i+1); j++} NF=j-1}1' test.txt
A1 2 123 ff jj kk
Sorry just noticed you gave two lines as example...
:~/t> more test.txt
A1 2 123 f f j j k k
A2 10 789 f o p f m n
:~/t> awk '{for(i=j=4; i < NF; i+=2) {$j = $i$(i+1); j++} NF=j-1}1' test.txt
A1 2 123 ff jj kk
A2 10 789 fo pf mn

Related

matching rows and fields from two files

I would like to match the record number in one file will the same field number in another file:
file1:
1
3
5
4
3
1
5
file2:
A B C D E F G
H I J J K L M
N O P Q R S T
I would like to use the record numbers corresponding to 5 in the first file to obtain the corresponding fields in the second file. Desired output:
C G
J M
P T
So far, I've done:
awk '{ if ($1=="5") print NR }' file1 > temp
for i in $(cat temp); do
awk '{ print $"'${i}'" }' file2
done
But get the output:
C
J
P
G
M
T
I would like to have this in the format of the desired output above, but can't get it to work. Perhaps using prinf or awk for-loop might work, but I have had no success.
Thank you all.
awk 'NR==FNR{if($1==5)a[NR];next}{for(i in a){printf $i" "}print ""}' a b
C G
J M
P T

Comparing fields of two files in awk

I want to compare two fields of two files, such as follows:
Compare the 2nd filed of file one with the 1st field of file two, print the match (even if the match is repeated) and all the columns of file one and two.
File 1:
G4 b45 3 4
G4 b45 1 3
G3 b23 2 2
G3 b22 2 6
G3 b22 2 4
File 2:
b45 a b c
b64 d e f
b23 g h i
b22 j k l
b20 m n o
Output:
G4 b45 a b c 3 4
G4 b45 a b c 1 3
G3 b23 g h i 2 2
G3 b22 j k l 2 6
G3 b22 j k l 2 4
I have tried this with the following awk command using associative arrays:
awk 'FNR==NR {array1[$2] = $1 ; arrayrest[$2] = substr($0, index($0, $2)); next}($1 in array1) {print array1[$1] "\t" $0 "\t" arrayrest[$1]}' file1 file2
But there are two problems:
It does not print the lines if the match is repeated while I want them to be printed.
It repeats the first field of file two in the output.
How could I make this awk command work nicely? Thanks in advance.
Not quite the exact output formatting you want but the right output contents.
awk 'FNR==NR{seen[$1]=$0; next} ($2 in seen) {$2=seen[$2]}7' file2 file1
Add | column -t to get more consistent column spacing.
This should be simple and clear to u:
awk 'NR==FNR {n[$2]=$0} {if ($1 in n) print n[$1],$2,$3,$4}' file1 file2
small awk
awk '{x[$1]=$0}$2=x[$2]' f2 f1
If $1 and $2 can contain the same value
awk '{x[$1]=$0}FNR!=NR&&$2=x[$2]' f2 f1
output
G4 b45 a b c 3 4
G4 b45 a b c 1 3
G3 b23 g h i 2 2
G3 b22 j k l 2 6
G3 b22 j k l 2 4

Using an array in AWK when working with two files

I have two files I merged them based key using below code
file1
-------------------------------
1 a t p bbb
2 b c f aaa
3 d y u bbb
2 b c f aaa
2 u g t ccc
2 b j h ccc
file2
--------------------------------
1 11 bbb
2 22 ccc
3 33 aaa
4 44 aaa
I merged these two file based key using below code
awk 'NR==FNR{a[$3]=$0;next;}{for(x in a){if(x==$5) print $1,$2,$3,$4,a[x]};
My question is how I can save $2 of file2 in variable or array and print after a[x] again.
My desired result is :
1 a t p 1 11 bbb 11
2 b c f 3 33 aaa 33
2 b c f 4 44 aaa 44
3 d y u 1 11 bbb 11
2 b c f 3 33 aaa 33
2 b c f 4 44 aaa 44
2 u g t 2 22 ccc 22
2 b j h 2 22 ccc 22
As you see the first 7 columns is the result of my merge code. I need add the last column (field 2 of a[x]) to my result.
Important:
My next question is if I have .awk file, how I can use some bash script code like (| column -t) or send result to file (awk... > result.txt)? I always use these codes in command prompt. Can I use them inside my code in .awk file?
Simply add all of file2 to an array, and use split to hold the bits you want:
awk 'FNR==NR { two[$0]++; next } { for (i in two) { split(i, one); if (one[3] == $NF) print $1,$2,$3,$4, i, one[2] } }' file2 file1
Results:
1 a t p 1 11 bbb 11
2 b c f 3 33 aaa 33
2 b c f 4 44 aaa 44
3 d y u 1 11 bbb 11
2 b c f 3 33 aaa 33
2 b c f 4 44 aaa 44
2 u g t 2 22 ccc 22
2 b j h 2 22 ccc 22
Regarding your last question; you can also add 'pipes' and 'writes' inside of your awk. Here's an example of a pipe to column -t:
Contents of script.awk:
FNR==NR {
two[$0]++
next
}
{
for (i in two) {
split(i, one)
if (one[3] == $NF) {
print $1,$2,$3,$4, i, one[2] | "column -t"
}
}
}
Run like: awk -f script.awk file2 file1
EDIT:
Add the following to your shell script:
results=$(awk '
FNR==NR {
two[$0]++
next
}
{
for (i in two) {
split(i, one)
if (one[3] == $NF) {
print $1,$2,$3,$4, i, one[2] | "column -t"
}
}
}
' $1 $2)
echo "$results"
Run like:
./script.sh file2.txt file1.txt
Results:
1 a t p 1 11 bbb 11
2 b c f 3 33 aaa 33
2 b c f 4 44 aaa 44
3 d y u 1 11 bbb 11
2 b c f 3 33 aaa 33
2 b c f 4 44 aaa 44
2 u g t 2 22 ccc 22
2 b j h 2 22 ccc 22
Your current script is:
awk 'NR==FNR { a[$3]=$0; next }
{ for (x in a) { if (x==$5) print $1,$2,$3,$4,a[x] } }'
(Actually, the original is missing the second close brace for the second pattern/action pair.)
It seems that you process file2 before you process file1.
You shouldn't need the loop in the second code. And you can make life easier for yourself by using the splitting in the first phase to keep the values you need:
awk 'NR==FNR { c1[$3] = $1; c2[$3] = $2; next }
{ print $1, $2, $3, $4, c1[$5], c2[$5], $5, c2[$5] }'
You can upgrade that to check whether c1[$5] and c2[$5] are defined, presumably skipping the row if they are not.
Given your input files, the output is:
1 a t p 1 11 bbb 11
2 b c f 4 44 aaa 44
3 d y u 1 11 bbb 11
2 b c f 4 44 aaa 44
2 u g t 2 22 ccc 22
2 b j h 2 22 ccc 22
Give or take column spacing, that's what was requested. Column spacing can be fixed by using printf instead of print, or setting OFS to tab, or ...
The c1 and c2 notations for column 1 and 2 is OK for two columns. If you need more, then you should probably use the 2D array notation:
awk 'NR==FNR { for (i = 1; i <= NF; i++) col[i,$3] = $i; next }
{ print $1, $2, $3, $4, col[1,$5], col[2,$5], $5, col[2,$5] }'
This produces the same output as before.
To achieve what you ask, save the second field after the whole line in the processing of your first file, with a[$3]=$0 OFS $2. For your second question, awk has a variable to separate fields in output, it's OFS, assign a tabulator to it and play with it. Your script would be like:
awk '
BEGIN { OFS = "\t"; }
NR==FNR{
a[$3]=$0 OFS $2;
next;
}
{
for(x in a){
if(x==$5) print $1,$2,$3,$4,a[x]
}
}
' file2 file1
That yields:
1 a t p 1 11 bbb 11
2 b c f 4 44 aaa 44
3 d y u 1 11 bbb 11
2 b c f 4 44 aaa 44
2 u g t 2 22 ccc 22
2 b j h 2 22 ccc 22

combine 2 files with AWK based last colums [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
i have two files
file1
-------------------------------
1 a t p b
2 b c f a
3 d y u b
2 b c f a
2 u g t c
2 b j h c
file2
--------------------------------
1 a b
2 p c
3 n a
4 4 a
i want combine these 2 files based last columns (column 5 of file1 and column 3 of file2) using awk
result
----------------------------------------------
1 a t p 1 a b
2 b c f 3 n a
2 b c f 4 4 a
3 d y u 1 a b
2 b c f 3 n a
2 b c f 4 4 a
2 u g t 2 p c
2 b j h 2 p c
at the very beginning, I didn't see the duplicated "a" in file2, I thought it would be solved with normal array matching. ... now it works.
an awk onliner:
awk 'NR==FNR{a[$3"_"NR]=$0;next;}{for(x in a){if(x~"^"$5) print $1,$2,$3,$4,a[x];}}' f2.txt f1.txt
test
kent$ head *.txt
==> f1.txt <==
1 a t p b
2 b c f a
3 d y u b
2 b c f a
2 u g t c
2 b j h c
==> f2.txt <==
1 a b
2 p c
3 n a
4 4 a
kent$ awk 'NR==FNR{a[$3"_"NR]=$0;next;}{for(x in a){if(x~"^"$5) print $1,$2,$3,$4,a[x];}}' f2.txt f1.txt
1 a t p 1 a b
2 b c f 3 n a
2 b c f 4 4 a
3 d y u 1 a b
2 b c f 3 n a
2 b c f 4 4 a
2 u g t 2 p c
2 b j h 2 p c
note, the output format was not sexy, but it would be acceptable if pipe it to column -t
Other way assuming files have no headers:
awk '
FNR == NR {
f2[ $NF ] = f2[ $NF ] ? f2[ $NF ] SUBSEP $0 : $0;
next;
}
FNR < NR {
if ( $NF in f2 ) {
split( f2[ $NF ], a, SUBSEP );
len = length( a );
for ( i = 1; i <= len; i++ ) {
$NF = a[ i ];
}
}
printf "%s\n", $0;
}
' file2 file1 | column -t
It yields:
1 a t p 1 a b
2 b c f 3 n a
2 b c f 4 4 a
3 d y u 1 a b
2 b c f 3 n a
2 b c f 4 4 a
2 u g t 2 p c
2 b j h 2 p c
A bit easier in a language that supports arbitrary data structures (list of lists). Here's ruby
# read "file2" and group by the last field
file2 = File .foreach('file2') .map(&:split) .group_by {|fields| fields[-1]}
# process file1
File .foreach('file1') .map(&:split) .each do |fields|
file2[fields[-1]] .each do |fields2|
puts (fields[0..-2] + fields2).join(" ")
end
end
outputs
1 a t p 1 a b
2 b c f 3 n a
2 b c f 4 4 a
3 d y u 1 a b
2 b c f 3 n a
2 b c f 4 4 a
2 u g t 2 p c
2 b j h 2 p c

Looping and merging 2 files

First of all, please pardon me, I am a noob.
My problem is as follows:
I have 2 text files - file1 and file2.
Following are the file samples and the desired output:
file1:
A B C
D E F
G H I
file2:
a1 a2 a3
b1 b2 b3
c1 c2 c3
Desired output:
A B C a1 a2 a3
A B C b1 b2 b3
A B C c1 c2 c3
D E F a1 a2 a3
D E F b1 b2 b3
D E F c1 c2 c3
and so on.
Can anybody please help me out with this?
awk 'FNR == NR {file2[FNR] = $0; c++; next} {for (i = 1; i <= c; i++) {print $0, file2[i]}}' file2 file1
Read all the lines of file2 into an array. For each line of file1, loop through the array and print the line from file1 and the line from file2.
In Bash:
while read -r line
do
file2+=("$line")
done < file2
while read -r line
do
for line2 in "${file2[#]}"
do
echo "$line $line2"
done
done < file1