pasting files/multiple columns with different number of rows - awk

Hi I was trying to paste multiple files (each with a single column but different number of rows) together. But it did't provide what I was expecting. How to solve that?
paste file1.txt file2.txt paste3.txt ... paste100 > out.txt
input file 1:
A
B
C
input file 2:
D
E
input file 3:
F
G
H
I
J
.......
......
Desired output:
A D F
B E G
C H
I
J
Would this be same if the files have multiple columns with different number of rows?
for example:
file1
A 1
B 2
C 3
file2
D 4
E 5
file3
F 6 %
G 7 &
H 8 #
I 9 #
J 10 ?
output:
A 1 D 4 F 6 %
B 2 E 5 G 7 &
C 3 H 8 #
I 9 #
J 10 ?

Isn't the default behaviour of paste exactly what you ask?
% paste <(echo "a
b
c
d") <(echo "1
2
3") <(echo "10
> 20
> 30
> 40
> 50
> 60")
a 1 10
b 2 20
c 3 30
d 40
50
60
%

Related

See if 2 columns are 1 to 1 match in sql

I have a table in sql.
I need to check if columns A and B are 1 to 1 match (Meaning for each row in A, there is only one value in B)
but if you see below, it is not.
Only Cat A and Cat C have one values in Col B.
So is there a way to identify between 2 columns
A B C
Cat A asd 34
Cat A asd 56
Cat B dfg 67
Cat B ghj 7
Cat C ggh 78
Cat D ertrty 9
Cat D tyutyu 6
Cat D tuuiy 45
SELECT A, COUNT(DISTINCT B)
FROM your_table
GROUP BY A
HAVING COUNT(DISTINCT B) = 1;

calculate current line column less previous line column using awk

my input
a 9
b 2
c 5
d 3
e 7
desired output (current line column 2 - previous line column 2)
a 9
b 2 -7
c 5 3
d 3 -2
e 7 4
explanation
a 9
b 2 -7 ( 2-9 = -7 )
c 5 3 ( 5-2 = 3 )
d 3 -2 ( 3-5 = -2 )
e 7 4 ( 7-3 = 4 )
I tried this without success
awk '{ print $1, $2,$2 - $(NR-1) }' input
I want a awk code to generate an additional column contains de calculation of current line less previous line in column 2
You can try this awk
$ awk 'NR==1{ print $0 } NR>1{ print $0,$2 - pre } { pre=$2 }' file
a 9
b 2 -7
c 5 3
d 3 -2
e 7 4

Print lines in both file when 2 different columns match

I have 2 tab delim files
file 1
B T 4 tab -
1 C 5 - cab
5 A 2 - ttt
D T 18 1111 -
file 2
K A 3 0.1
T B 4 0.3
P 1 5 0.5
P 5 2 0.11
I need to merge the two based on file 1 col1 and 3 and file2 col2 and 3, and print lines in both files. I'm expecting the following output:
B T 4 tab - T B 4 0.3
1 C 5 - cab P 1 5 0.5
5 A 2 - ttt P 5 2 0.11
I tried adapting from a similar question I had in the past:
awk 'NR==FNR {a[$1,$3] = $2"\t"$4"\t"$5; next} $2,$3 in a {print a[$1,$3],$0}' file1 file2
but no success, the output I get looks like this, which is similar to file2:
K A 3 0.1
T B 4 0.3
P 1 5 0.5
P 5 2 0.11
There are two small problems in your code:
awk 'NR==FNR{a[$1,$3]=$0; next} ($2,$3) in a {print a[$2,$3], $0}' file1 file2
# parentheses -^ ----^
# $2,$3 ----^

selecting specific lines using awk

I have a file with lines like these
1 1000034 G C 0.4 12
2 1000435 C G 0.1 52
3 0092943 A T 0.2 5
4 0092241 G A 0.3 34
etc.
columns 3 and 4 only contain the characters AGCT
I need to print lines that DO NOT contain both G and C in columns 3 and 4.
What I´m trying so far in awk is doing
awk ' { if ($3!="G" && $4!="C") print }' file
but this is also excluding lines with G and A in columns 3 and 4, respectively. I only want to exclude lines with G and C in columns 3 and 4, respectively.
I prefer to use awk for this problem.
One way:
awk '!($3=="G" && $4=="C")' file
Trying to print the inverse of G & C combination

combine 2 files with AWK based last colums [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
i have two files
file1
-------------------------------
1 a t p b
2 b c f a
3 d y u b
2 b c f a
2 u g t c
2 b j h c
file2
--------------------------------
1 a b
2 p c
3 n a
4 4 a
i want combine these 2 files based last columns (column 5 of file1 and column 3 of file2) using awk
result
----------------------------------------------
1 a t p 1 a b
2 b c f 3 n a
2 b c f 4 4 a
3 d y u 1 a b
2 b c f 3 n a
2 b c f 4 4 a
2 u g t 2 p c
2 b j h 2 p c
at the very beginning, I didn't see the duplicated "a" in file2, I thought it would be solved with normal array matching. ... now it works.
an awk onliner:
awk 'NR==FNR{a[$3"_"NR]=$0;next;}{for(x in a){if(x~"^"$5) print $1,$2,$3,$4,a[x];}}' f2.txt f1.txt
test
kent$ head *.txt
==> f1.txt <==
1 a t p b
2 b c f a
3 d y u b
2 b c f a
2 u g t c
2 b j h c
==> f2.txt <==
1 a b
2 p c
3 n a
4 4 a
kent$ awk 'NR==FNR{a[$3"_"NR]=$0;next;}{for(x in a){if(x~"^"$5) print $1,$2,$3,$4,a[x];}}' f2.txt f1.txt
1 a t p 1 a b
2 b c f 3 n a
2 b c f 4 4 a
3 d y u 1 a b
2 b c f 3 n a
2 b c f 4 4 a
2 u g t 2 p c
2 b j h 2 p c
note, the output format was not sexy, but it would be acceptable if pipe it to column -t
Other way assuming files have no headers:
awk '
FNR == NR {
f2[ $NF ] = f2[ $NF ] ? f2[ $NF ] SUBSEP $0 : $0;
next;
}
FNR < NR {
if ( $NF in f2 ) {
split( f2[ $NF ], a, SUBSEP );
len = length( a );
for ( i = 1; i <= len; i++ ) {
$NF = a[ i ];
}
}
printf "%s\n", $0;
}
' file2 file1 | column -t
It yields:
1 a t p 1 a b
2 b c f 3 n a
2 b c f 4 4 a
3 d y u 1 a b
2 b c f 3 n a
2 b c f 4 4 a
2 u g t 2 p c
2 b j h 2 p c
A bit easier in a language that supports arbitrary data structures (list of lists). Here's ruby
# read "file2" and group by the last field
file2 = File .foreach('file2') .map(&:split) .group_by {|fields| fields[-1]}
# process file1
File .foreach('file1') .map(&:split) .each do |fields|
file2[fields[-1]] .each do |fields2|
puts (fields[0..-2] + fields2).join(" ")
end
end
outputs
1 a t p 1 a b
2 b c f 3 n a
2 b c f 4 4 a
3 d y u 1 a b
2 b c f 3 n a
2 b c f 4 4 a
2 u g t 2 p c
2 b j h 2 p c