compare a text file with another files - awk

I have a file named file.txt as shown below
12 2
15 7
134 8
154 12
155 16
167 6
175 45
45 65
812 54
I have another five files named A.txt, B.txt, C.txt, D.txt, E.txt. The contents of these files are shown below.
A.txt
45
134
B.txt
15
812
155
C.txt
12
154
D.txt
175
E.txt
167
I need to check, which file contains the values of first column of file.txt exists and print the name of the file as third column.
Output:-
12 2 C
15 7 B
134 8 A
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B

This should work:
One-liner:
awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) }1' {A..E}.txt file.txt
Formatted with comments:
awk '
#Check if the filename is not of the main file
FILENAME != "file.txt" {
#Create a hash. Store column 1 values of look up files as key and assign filename as values
a[$1]=FILENAME
#Skip the rest of the action
next
}
#Check the first column of main file is a key in the hash
$1 in a {
#If the key exists, assign the value of the key (which is filename) as Column 3 of main file
$3=a[$1]
#Using sub function, strip the extension of the file name as desired in your output
sub(/\..*/,"",$3)
#1 is a non-zero value forcing awk to print. {A..E} is brace expansion of your files.
}1' {A..E}.txt file.txt
Note: The main file needs to be passed at the end.
Test:
[jaypal:~/Temp] awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) ; printf "%-5s%-5s%-5s\n",$1,$2,$3}' {A..E}.txt file.txt
12 2 C
15 7 B
134 8 A
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B

#! /usr/bin/awk -f
FILENAME == "file.txt" {
a[FNR] = $0;
c=FNR;
}
FILENAME != "file.txt" {
split(FILENAME, name, ".");
k[$1] = name[1];
}
END {
for (line = 1; line <= c; line++) {
split(a[line], seg, FS);
print a[line], k[seg[1]];
}
}
# $ awk -f script.awk *.txt

This solution does not preserve the order
join <(sort file.txt) \
<(awk '
FNR==1 {filename = substr(FILENAME, 1, length(FILENAME)-4)}
{print $1, filename}
' [ABCDE].txt |
sort) |
column -t
12 2 C
134 8 A
15 7 B
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B

Related

How to insert column from a file to another file at multiple places

I would like to insert columns no. 1 and 2 from file no. 2 into file no. 1 after every second column and till the last column.
File1.txt (tab-separated, column range from 1-2400 and cell range from 1-4500)
ID IMPACT ID IMPACT ID IMPACT
51 0.288 128 0.4557 156 0.85
625 0.858 15 -0.589 51 0.96
8 0.845 7 0.5891
File2.txt (consist of only two-tab separated column with 19000 raws)
ID IMPACT
18 -1
165 -1
41 -1
11 -1
Output file
ID IMPACT ID IMPACT ID IMPACT ID IMPACT ID IMPACT ID IMPACT
51 0.288 18 -1 128 0.4557 18 -1 156 0.85 18 -1
625 0.858 165 -1 15 -0.589 165 -1 51 0.96 165 -1
8 0.845 41 -1 7 0.5891 41 -1 41 -1
11 -1 11 -1 11 -1
I tried the below commands but it's not working
paste <(cut -f 1,2 File1.txt) <(cut -f 1,2 File2.txt) <(cut -f 3,4 File1.txt) <(cut -f 1,2 File2.txt)......... > File3
Prob: It starts sifting the File2.txt column value into different columns after the highest cell of File1.txt
paste File1.txt File2.txt > File3.txt
awk '{print $1 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" $6 "\t" $3 "\t" $4....}' File3.txt > File4.txt
This do the job, however it mixup the value of File1.txt from one column to another column.
I tried everything but failed to succeed.
Any help would be appreciated, however, bash or pandas would be better. Thanks in advance.
$ awk '
BEGIN {
FS=OFS="\t" # tab-separated data
}
NR==FNR { # hash fields of file2
a[FNR]=$1 # index with record numbers FNR
b[FNR]=$2
next
}
{ # print file1 records with file2 fields
print $1,$2,a[FNR],b[FNR],$3,$4,a[FNR],b[FNR],$5,$6,a[FNR],b[FNR]
}
END { # in the end
for(i=(FNR+1);(i in a);i++) # deal with extra records of file2
print "","",a[i],b[i],"","",a[i],b[i],"","",a[i],b[i]
}' file2 file1
Output:
ID IMPACT ID IMPACT ID IMPACT ID IMPACT ID IMPACT ID IMPACT
51 0.288 18 -1 128 0.4557 18 -1 156 0.85 18 -1
625 0.858 165 -1 15 -0.589 165 -1 51 0.96 165 -1
8 0.845 41 -1 7 0.5891 41 -1 41 -1
11 -1 11 -1 11 -1

how to append a file to the second column of another tsv file

I have a file first.txt that looks like this :
45
56
74
62
I want to append this file to second.tsv that looks like this(there are 17 columns) :
2 a ...
3 b ...
5 c ...
6 d ...
The desired output is :
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
How can I append to the second column?
I've tried
awk -F, '{getline f1 <"first.txt" ;print $1,f1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' second.tsv
but did not work. This added the columns of first.txt to the last column of second.tsv, and it was not tab separated.
Thank you.
Your code works if you remove the -F, bit. This tells awk that the file is comma-separated, which it is not.
Another option would be to go for a piped version with paste, e.g.:
paste first.tsv second.tsv | awk '{ t=$2; $2=$1; $1=t } 1' OFS='\t'
Output:
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
$ awk 'NR==FNR{a[FNR]=$0;next} {$1=$1 OFS a[FNR]} 1' file1 file2
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
If your files are tab-separated add BEGIN{FS=OFS="\t"} at the front.

Changing the field separator of awk to newline

The -F option lets you specify the field separator for awk, but using '\n' as the line separator doesn't work, that is, it doesn't make $1 the first line of the input, $2 the second line, and so on.
I suspect that this is because awk looks for the field separator within each line. Is there a way to get around this with awk, or some other Linux command? Basically, I want to separate my input by newline characters and put them into an Excel file.
I'm still warming up to Linux and shell scripts, which is the reason for my lack of creativity with this problem.
Thank you!
You may require to overwrite the input record separator (RS), which default is newline.
See my example below,
$ cat test.txt
a
b
c
d
$ awk 'BEGIN{ RS = "" ; FS = "\n" }{print $1,$2,$3,$4}' test.txt
a b c d
Note that you can change both the input and output record separator so you can do something like this to achieve a similar result to the accepted answer.
cat test.txt
a
b
c
d
$ awk -v ORS=" " '{print $1}' test.txt
a b c d
one can simplify it to just the following, with a minor caveat of extra trailing space without trailing newline :
% echo "a\nb\nc\nd"
a
b
c
d
% echo "a\nb\nc\nd" | mawk 8 ORS=' '
a b c d %
To rectify that, plus handling the edge case of no trailing newline from input, one can modify it to :
% echo -n "a\nb\nc\nd" | mawk 'NF-=_==$NF' FS='\n' RS='^$' | odview
0000000 543301729 174334051
a b c d \n
141 040 142 040 143 040 144 012
a sp b sp c sp d nl
97 32 98 32 99 32 100 10
61 20 62 20 63 20 64 0a
0000010
% echo "a\nb\nc\nd" | mawk 'NF -= (_==$NF)' FS='\n' RS='^$' | odview
0000000 543301729 174334051
a b c d \n
141 040 142 040 143 040 144 012
a sp b sp c sp d nl
97 32 98 32 99 32 100 10
61 20 62 20 63 20 64 0a
0000010

join the contents of files into a new file

I have some text files as shown below. I would like to join the contents of these files into one.
file A
>AXC
145
146
147
>SDF
1
8
67
>FGH
file B
>AXC
>SDF
12
65
>FGH
123
156
190
Desired ouput
new file
>AXC
145
146
147
>SDF
1
8
67
12
65
>FGH
123
156
190
your help would be appreciated!
awk '
/^>/ { key=$0; if (!seen[key]++) keys[++numKeys] = key; next }
{ vals[key] = vals[key] ORS $0 }
END{ for (keyNr=1;keyNr<=numKeys;keyNr++) {key = keys[keyNr]; print key vals[key]} }
' fileA fileB
>AXC
145
146
147
>SDF
1
8
67
12
65
>FGH
123
156
190
If you really want the leading white space added to the ">SDF" values from fileA, tell us why that's the case for that one but not ">AXC" so we can code an appropriate solution.
A bit shorter than Ed's answer
awk '/^>/{a=$0;next}{x[a]=x[a]$0"\n"}END{for(i in x)printf"%s\n%s",i,x[i]}'
Blocks will be printed in an unspecified order.
RS=">" seperate records by > character
OFS="\n" is to have number it's own line.
a[i]=a[i] $0 add fields into array with index of first field.
rt=RT is for adding > character to index
$ awk 'BEGIN{ RS=">"; OFS="\n" }
{i=rt $1; $1=""; a[i]=a[i] $0; rt=RT; next}
END { for (i in a) {print i a[i] }}' d6 d5
>SDF
12
65
1
8
67
>FGH
123
156
190
>AXC
145
146
147

How do i add the second column based on column 1 in awk ? For example i used the following script

zcat *.gz | awk '{print $1}' |sort| uniq -c | sed 's/^[ ]\+//g' | cut -d' ' -f1 | sort | uniq -c | sort -k1n
I get the following output:
3 648
3 655
3 671
3 673
3 683
3 717
4 18
4 29
4 31
4 34
4 652
5 12
6 24
6 33
7 13
12 10
13 9
14 8
33 7
73 6
166 5
383 4
1178 3
3945 2
26692 1
I don't want repetitions in my 1st column. Example: if my first column is 3 , i should add all the values in the second column that are associated with 3. Thank you
Solution using arrays in awk
{
a[$1]=a[$1]+$2
}
END {
for (i in a)
printf("%d\t%d\n", i, a[i])
}
Pipe the output through sort -n once more to have it in ascending order
$ awk -f num.awk numbers | sort -n
3 4047
4 764
5 12
6 57
7 13
12 10
13 9
14 8
33 7
73 6
166 5
383 4
1178 3
3945 2
26692 1
awk 'NF == 1 {c=$1; print $0} NF>1 {if (c==$1) {print "\t" $2} else {c=$1; print $0}}'
can do it, but please note, that the indentation can be incorrect, as I had used a simple tab \t above.
HTH