How do I add specific columns from different tables onto an existing table in postgresql? - sql

I have an original table (TABLE 1):
A
B
C
D
1
3
5
7
2
4
6
8
I want to add column F from the table below (Table 2) onto table 1:
A
F
G
H
1
29
5
7
2
30
6
8
As well as adding Column J,L and O from the table below (Table 3) onto column 1:
A
I
J
K
L
M
N
O
1
9
11
13
15
17
19
21
2
10
12
14
16
18
20
22
How do I go about adding only the specific columns onto table 1?
Expected Result:
A
B
C
D
F
J
L
O
1
3
5
7
29
11
15
21
2
4
6
8
30
12
16
22

Use following query
SELECT T1.A,
B,
C,
D,
F,
J,
L,
O
FROM table1 T1
JOIN table2 T2
ON T1.A = T2.A
JOIN table3 t3
ON T1.A = T3.A

Related

Add entries of a table into rows of another table

I have two tables
table a:
ID VALUE_z
1 41
2 32
3 51
table b:
ID TYPE z
1 a 10
1 b 15
1 c 20
2 a 12
2 b 8
2 c 5
3 a 21
3 b 4
3 c 2
I want to add the rows from table a to the column VALUE in table b based on the ID. The result should look like this
table result:
ID TYPE VALUE
1 a 10
1 b 15
1 c 20
1 z 41
2 a 12
2 b 8
2 c 5
2 z 32
3 a 21
3 b 4
3 c 2
3 z 51
Try the following using INSERT INTO SELECT Statement:
insert into tableB
select ID, 'z', VALUE_z
from tableA
See demo

pandas: get top n including the duplicates of a sorted column

I have some data like
This is a table sorted by score column and also then by cat column
score cat
18 B
18 A
17 A
16 B
16 A
15 B
14 B
13 A
12 A
10 B
9 B
I want to get the top 5 of score including the duplicates and also add the rank
i.e
rank score cat
1 18 B
1 18 A
2 17 A
3 16 B
3 16 A
4 15 B
5 14 B
How can i get this using pandas
Since the data frame is ordered, try factorize
df['rnk'] = df.score.factorize()[0]+1
out = df[df['rnk'] <= 5]
out
score cat rnk
0 18 B 1
1 18 A 1
2 17 A 2
3 16 B 3
4 16 A 3
5 15 B 4
6 14 B 5

Join information from table A with the information from another table B multiple times

So this maybe a simple question but I would like to learn if this can be done in one query.
Table A: contains gene information
gene start end
1 a 5 0
2 b 6 1
3 c 7 2
4 d 8 3
5 e 9 4
6 f 10 5
7 g 11 6
8 h 12 7
9 i 13 8
10 j 14 9
Table B: contains calculated gene information.
gene1 gene2 cor
1 d j -0.7600805
2 c i 0.4274278
3 e g -0.9249361
4 a f 0.8567928
5 b h -0.3018518
6 d j -0.3723553
7 c i 0.1617981
8 e g 0.8575933
9 a f 0.8409788
10 b h 0.1506035
The result table I'm trying to get is:
gene1 gene2 cor start1 end1 start2 end2
1 d j -0.7600805 8 3 14 9
2 c i 0.4274278 7 2 13 8
3 e g -0.9249361
4 a f 0.8567928
5 b h -0.3018518
6 d j -0.3723553 etc.
7 c i 0.1617981
8 e g 0.8575933
9 a f 0.8409788
10 b h 0.1506035
The method I can think of is to join table A onto table B twice, firstly by gene1 and then by gene2, which would require for an intermediate table. Is there any simpler ways to achieve this in one step?
Yes, two joins will do it
You simply need to do this:
SELECT b.Gene1
,b.Gene2
,b.cor
,a1.Start AS Start1
,a1.End AS End1
,a2.Start AS Start2
,a2.End AS End2
FROM TableB b
INNER JOIN TableA a1
ON a1.Gene = b.Gene1
INNER JOIN TableA a2
ON a2.Gene = b.Gene2
Depending on your dbms you may need to tweek the syntax a bit

Set one row fields as a multiplication of 2 others

I have a such a structure of SQL table
Id A B C D
1 1 5 6 25
2 2 10 5 25
3 3 7 4 25
4 1 6 5 26
5 2 10 5 26
6 3 8 3 26
I want to write a script, which will update all the B & C columns in the rows with A=3 with the value of multiplication of the A = 1 and A = 2 (for the same value of D column)
So the result should be
Id A B C D
1 1 5 6 25
2 2 10 5 25
3 3 50 30 25
4 1 6 5 26
5 2 10 5 26
6 3 60 25 26
How can I write such a code in SQL?
One possible way is joining table to itself twice:
update T3
set
T3.B = T1.B * T2.B,
T3.C = T1.C * T2.C
from [Table] T3
join [Table] T1 on T1.A = 1 and T1.D = T3.D
join [Table] T2 on T2.A = 2 and T2.D = T3.D
where
T3.A = 3

Dedup using HiveQL

I have a hive table with field 'a'(int), 'b'(string), 'c'(bigint), 'd'(bigint) and 'e'(string).
I have data like:
a b c d e
---------------
1 a 10 18 i
2 b 11 19 j
3 c 12 20 k
4 d 13 21 l
1 e 14 22 m
4 f 15 23 n
2 g 16 24 o
3 h 17 25 p
Table is sorted on key 'b'.
Now we want output like below:
a b c d e
---------------
1 e 14 22 m
4 f 15 23 n
2 g 16 24 o
3 h 17 25 p
which will be deduped on key 'a' but will keep last(latest) 'b'.
Is it possible using Hive query(HiveQL)?
If column b is unique, Try follow hql:
select
*
from
(
select max(b) as max_b
from
table
group by a
) table1
join table on table1.max_b = table.b