how to append a file to the second column of another tsv file - awk

I have a file first.txt that looks like this :
45
56
74
62
I want to append this file to second.tsv that looks like this(there are 17 columns) :
2 a ...
3 b ...
5 c ...
6 d ...
The desired output is :
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
How can I append to the second column?
I've tried
awk -F, '{getline f1 <"first.txt" ;print $1,f1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' second.tsv
but did not work. This added the columns of first.txt to the last column of second.tsv, and it was not tab separated.
Thank you.

Your code works if you remove the -F, bit. This tells awk that the file is comma-separated, which it is not.
Another option would be to go for a piped version with paste, e.g.:
paste first.tsv second.tsv | awk '{ t=$2; $2=$1; $1=t } 1' OFS='\t'
Output:
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...

$ awk 'NR==FNR{a[FNR]=$0;next} {$1=$1 OFS a[FNR]} 1' file1 file2
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
If your files are tab-separated add BEGIN{FS=OFS="\t"} at the front.

Related

How to update one file's column from another file's column in awk

I have two files, the first file:
1 AA
2 BB
3 CC
4 DD
and the second file
15 AA
17 BB
20 CC
25 FF
File 1 should be updated and the expected output should looks like this:
15 AA
17 BB
20 CC
4 DD
I have tried this script from another post but it didn't work
awk 'NR==FNR{a[$1]=$2;next}a[$1]{print $2,a[$1]}' file1 file2
$ awk 'NR==FNR{a[$2]=$1; next} $2 in a{$1=a[$2]} 1' file2 file1
15 AA
17 BB
20 CC
4 DD
Here is an awk:
awk 'FNR==NR{f2[$2]=$0; next}
$2 in f2 {print f2[$2]; next}
1' file2 file1
Prints:
15 AA
17 BB
20 CC
4 DD

AWK program that can read a second file either from a file specified on the command line or from data received via a pipe

I have an AWK program that does a join of two files, file1 and file2. The files are joined based on a set of columns. I placed the AWK program into a bash script that I named join.sh. See below. Here is an example of how the script is executed:
./join.sh '1,2,3,4' '2,3,4,5' file1 file2
That says this: Do a join of file1 and file2, using columns (fields) 1,2,3,4 of file1 and columns (fields) 2,3,4,5 of file2.
That works great.
Now what I would like to do is to filter file2 and pipe the results to the join tool:
./fetch.sh ident file2 | ./join.sh '1,2,3,4' '2,3,4,5' file1
fetch.sh is a bash script containing an AWK program that fetches the rows in file2 with primary key ident and outputs to stdout the rows that were fetched.
Unfortunately, that pipeline is not working. I get no results.
Recap: I want the join program to be able to read the second file either from a file that I specify on the command line or from data received via a pipe. How to do that?
Here is my bash script, named join.sh
#!/bin/bash
awk -v f1cols=$1 -v f2cols=$2 '
BEGIN { FS=OFS="\t"
m=split(f1cols,f1,",")
n=split(f2cols,f2,",")
}
{ sub(/\r$/, "") }
NR == 1 { b[0] = $0 }
(NR == FNR) && (NR > 1) { idx2=$(f2[1])
for (i=2;i<=n;i++)
idx2=idx2 $(f2[i])
a[idx2] = $0
next
}
(NR != FNR) && (FNR == 1) { print $0, b[0] }
FNR > 1 { idx1=$(f1[1])
for (i=2;i<=m;i++)
idx1=idx1 $(f1[i])
for (idx1 in a)
print $0, a[idx1]
}' $3 $4
I'm not sure if this is 'correct' as you haven't provided any example input and expected output, but does using - to signify stdin work for your use-case? E.g.
cat file1
1 2 3 4
AA BB CC DD
AA EE FF GG
cat file2
1 2 3 4
AA ZZ YY XX
AA 11 22 33
./join.sh '1' '1' file1 file2
1 2 3 4 1 2 3 4
AA ZZ YY XX AA BB CC DD
AA ZZ YY XX AA EE FF GG
AA 11 22 33 AA BB CC DD
AA 11 22 33 AA EE FF GG
cat file2 | ./join.sh '1' '1' file1 -
1 2 3 4 1 2 3 4
AA ZZ YY XX AA BB CC DD
AA ZZ YY XX AA EE FF GG
AA 11 22 33 AA BB CC DD
AA 11 22 33 AA EE FF GG
be able to read(...)from data received via a pipe
GNU AWK does support Using getline from a Pipe consider following simple example
awk 'BEGIN{cmd="seq 7";while((cmd | getline) > 0){print $1*7};close(cmd)}' emptyfile
gives output
7
14
21
28
35
42
49
Explanation: I process output of seq 7 command (numbers from 1 to 7 inclusive, each on separate line), body of while is executed for each line of seq 7 output, fields are set like for normal processing.

Processing 2 files with different field separators using awk

Let's say I have 2 files :
$ cat file1
A:10
B:5
C:12
$ cat file2
100 A
50 B
42 C
I'd like to have something like :
A 10 100
B 5 50
C 12 42
I tried this :
awk 'BEGIN{FS=":"}NR==FNR{a[$1]=$2;next}{FS=" ";print $2,a[$2],$1}' file1 file2
Which outputs me that :
100 A
B 5 50
C 12 42
I guess the problem comes from the Field Separator which is set too late for the second file. How can I set different field separator for different files (and not for a single file) ?
Thanks
Edit: a more general case
With file2 and file3 like this :
$ cat file3
A:10 foo
B:5 bar
C:12 baz
How to get :
A 10 foo 100
B 5 bar 50
C 12 baz 42
Just set FS between files:
awk '...' FS=":" file1 FS=" " file2
i.e.:
$ awk 'NR==FNR{a[$1]=$2;next}{print $2,a[$2],$1}' FS=":" file1 FS=" " file2
A 10 100
B 5 50
C 12 42
You need to get awk to re-split $0 after you change FS.
You can do that with $0=$0 (for example).
So {FS=" ";$0=$0;...} in your final block will do what you want.
Though only doing that the first time you need to change FS will likely perform slightly better for large files.
You can try something like:
$ cat f1
A:10
B:5
C:12
$ cat f2
100 A
50 B
42 C
$ awk 'NR==FNR{split($0,tmp,/:/);a[tmp[1]]=tmp[2];next}$2 in a{print $2,a[$2],$1}' f1 f2
A 10 100
B 5 50
C 12 42
or set multiple field separators
$ awk -F"[: ]" 'NR==FNR{a[$1]=$2;next}$2 in a{print $2,a[$2],$1}' f1 f2
A 10 100
B 5 50
C 12 42

rearrange columns using awk or cut command

I have large file with 1000 columns. I want to rearrange so that last column should be the 3rd column. FOr this i have used,
cut -f1-2,1000,3- file > out.txt
But this does not change the order.
Could anyone help using cut or awk?
Also, I want to rearrange columns 10 and 11 as shown below:
Example:
1 10 11 2 3 4 5 6 7 8 9 12 13 14 15 16 17 18 19 20
try this awk one-liner:
awk '{$3=$NF OFS $3;$NF=""}7' file
this is moving the last col to the 3rd col. if you have 1000, then it does it with 1000th col.
EDIT
if the file is tab-delimited, you could try:
awk -F'\t' -v OFS="\t" '{$3=$NF OFS $3;$NF=""}7' file
EDIT2
add an example:
kent$ seq 20|paste -s -d'\t'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
kent$ seq 20|paste -s -d'\t'|awk -F'\t' -v OFS="\t" '{$3=$NF OFS $3;$NF=""}7'
1 2 20 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
EDIT3
You didn't give any input example. so assume you don't have empty columns in original file. (no continuous multi-tabs):
kent$ seq 20|paste -s -d'\t'|awk -F'\t' -v OFS="\t" '{$3=$10 FS $11 FS $3;$10=$11="";gsub(/\t+/,"\t")}7'
1 2 10 11 3 4 5 6 7 8 9 12 13 14 15 16 17 18 19 20
After all we could print those fields in a loop.
I THINK what you want is:
awk 'BEGIN{FS=OFS="\t"} {$3=$NF OFS $3; sub(OFS "[^" OFS "]*$","")}1' file
This might also work for you depending on your awk version:
awk 'BEGIN{FS=OFS="\t"} {$3=$NF OFS $3; NF--}1' file
Without the part after the semi-colon you'll have trailing tabs in your output.
Since many people are searching for this and even the best awk solution is not really pretty and easy to use I wanted to post my solution (mycut) written in Python:
#!/usr/bin/env python3
import sys
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
#example usage: cat file | mycut 3 2 1
columns = [int(x) for x in sys.argv[1:]]
delimiter = "\t"
for line in sys.stdin:
parts = line.split(delimiter)
print("\t".join([parts[col] for col in columns]))
I think about adding the other features of cut like changing the delimiter and a feature to use a * to print the remaning columns. But then it will get an own page.
A shell wrapper function for awk' that uses simpler syntax:
# Usage: rearrange int_n [int_o int_p ... ] < file
rearrange ()
{
unset n;
n="{ print ";
while [ "$1" ]; do
n="$n\$$1\" \" ";
shift;
done;
n="$n }";
awk "$n" | grep '\w'
}
Examples...
echo foo bar baz | rearrange 2 3 1
bar baz foo
Using bash brace expansion, rearrange first and last 5 items in descending order:
echo {1..1000}a | tr '\n' ' ' | rearrange {1000..995} {5..1}
1000a 999a 998a 997a 996a 995a 5a 4a 3a 2a 1a
Sorted 3-letter shells in /bin:
ls -lLSr /bin/?sh | rearrange 5 9
150792 /bin/csh
154072 /bin/ash
771552 /bin/zsh
1554072 /bin/ksh

compare a text file with another files

I have a file named file.txt as shown below
12 2
15 7
134 8
154 12
155 16
167 6
175 45
45 65
812 54
I have another five files named A.txt, B.txt, C.txt, D.txt, E.txt. The contents of these files are shown below.
A.txt
45
134
B.txt
15
812
155
C.txt
12
154
D.txt
175
E.txt
167
I need to check, which file contains the values of first column of file.txt exists and print the name of the file as third column.
Output:-
12 2 C
15 7 B
134 8 A
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B
This should work:
One-liner:
awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) }1' {A..E}.txt file.txt
Formatted with comments:
awk '
#Check if the filename is not of the main file
FILENAME != "file.txt" {
#Create a hash. Store column 1 values of look up files as key and assign filename as values
a[$1]=FILENAME
#Skip the rest of the action
next
}
#Check the first column of main file is a key in the hash
$1 in a {
#If the key exists, assign the value of the key (which is filename) as Column 3 of main file
$3=a[$1]
#Using sub function, strip the extension of the file name as desired in your output
sub(/\..*/,"",$3)
#1 is a non-zero value forcing awk to print. {A..E} is brace expansion of your files.
}1' {A..E}.txt file.txt
Note: The main file needs to be passed at the end.
Test:
[jaypal:~/Temp] awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) ; printf "%-5s%-5s%-5s\n",$1,$2,$3}' {A..E}.txt file.txt
12 2 C
15 7 B
134 8 A
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B
#! /usr/bin/awk -f
FILENAME == "file.txt" {
a[FNR] = $0;
c=FNR;
}
FILENAME != "file.txt" {
split(FILENAME, name, ".");
k[$1] = name[1];
}
END {
for (line = 1; line <= c; line++) {
split(a[line], seg, FS);
print a[line], k[seg[1]];
}
}
# $ awk -f script.awk *.txt
This solution does not preserve the order
join <(sort file.txt) \
<(awk '
FNR==1 {filename = substr(FILENAME, 1, length(FILENAME)-4)}
{print $1, filename}
' [ABCDE].txt |
sort) |
column -t
12 2 C
134 8 A
15 7 B
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B