Join two tables with no relation postgres? - sql

I have this statement which you can see
SELECT t1.*, t2.* FROM
(SELECT m.* FROM microposts AS m) AS t1
FULL JOIN
(SELECT r.* FROM ratings AS r) AS t2
ON true
I am using Rails and connecting to the database raw, but the output removes duplicate named columns eg user_id etc from the second table and is still giving results in the second table in regards to the first even though there is no relation. Eg
+------+-----------+-------+--------+
| m.id | m.content | r.id | rating |
+------+-----------+-------+--------+
| 1 | "hello" | 10 | 5 |
+------+-----------+-------+--------+
There is no relation between table m and r
I would like A output of something like this
+------+-----------+------+---------+
| m.id | m.content | r.id | rating |
+------+-----------+------+---------+
| 1 | "hello" | null | null |
| null | null | 5 | 4 |
| 2 | "gday" | null | null |
+------+-----------+------+---------+
....................... etc

This is rather exotic way to say UNION ALL
SELECT t1.*, t2.*
FROM
(SELECT m.* FROM microposts AS m) AS t1
FULL JOIN
(SELECT r.* FROM ratings AS r) AS t2
ON false
Contrary, ON true will create a cartesian product.

Related

What's the best way to create empty/default rows for missing aggregations

I have a table I want to group over two levels. As the output, I need all the grouping value combinations, such that I end up with zeros where non existant combinations occur. For example, say I have this table:
+------+------+
| user | page |
+------+------+
| a | 1 |
| a | 1 |
| a | 2 |
| b | 2 |
| b | 3 |
+------+------+
I'm after output like this:
+------+------+--------+
| user | page | visits |
+------+------+--------+
| a | 1 | 2 |
| a | 2 | 1 |
| a | 3 | 0 |
| b | 1 | 0 |
| b | 2 | 1 |
| b | 3 | 1 |
+------+------+--------+
I can achieve this with the following query, but it seems rather heavy handed:
WITH
users AS (SELECT distinct(user) FROM sometable),
pages AS (SELECT distinct(page) FROM sometable),
users_pages_empty AS (SELECT * FROM users CROSS JOIN pages),
users_pages_full AS (SELECT user, page, count(*) as visits FROM sometable GROUP BY user, page)
SELECT e.user, e.page, coalesce(f.visits, 0) as visits
FROM users_pages_empty e
LEFT JOIN users_pages_full f ON e.user=f.user AND e.page=f.page
I happen to be using AWS Athena, but I think this is more a generic SQL question than an Athena question.
The performance of this query is fine, it's more the readability/complexity I'm not happy with.
Use a cross join to generate the rows and a left join to bring in the existing rows and aggregate:
select u.user, p.page, count(s.user)
from (select distinct user from sometable) u cross join
(select distinct page from sometable) p left join
sometable s
on s.user = u.user and s.page = p.page
group by u.user, p.page
order by u.user, p.page;

Fill table (PK column A, PK column B) with table of elements B, where column A does not have B - list pair

So, to me comes Table2 (PK Text) filled with elements FIRST, SECOND, THIRD, FORTH.
I need to insert this element to TABLE1 where these elements are missing.
TABLE1
+----+--------+
| ID | Text |
+----+--------+
| A | FIRST |
| A | SECOND |
| A | THIRD |
| B | FIRST |
| B | THIRD |
| C | FIRST |
+----+--------+
So ID A misses FORTH
| A | FORTH |
Should be inserted
B misses SECOND and FORTH and so on
Answer should be something like that
+----+--------+
| ID | Text |
+----+--------+
| A | FIRST |
| A | SECOND |
| A | THIRD |
| A | FORTH |
| B | FIRST |
| B | SECOND |
| B | THIRD |
| B | FORTH |
| C | FIRST |
| C | SECOND |
| C | THIRD |
| C | FORTH |
+----+--------+
You can cross join the texts from table2 with the distinct ids availabel in table1 and filter on the missing tuples with a not exists condition, like so:
insert into table1(id, text)
select t1.id, t2.text
from table2 t2
cross join (select distinct id from table1) t1
where not exists (
select 1
from table1 t11
where t11.id = t1.id and t11.text = t2.text
)
Cross join the distinct IDs with the distinct Texts and left join the table to get the unmatched rows of the table:
INSERT INTO TABLE1 (ID, Text)
SELECT i.ID, d.Text
FROM (SELECT DISTINCT ID FROM TABLE1) i
CROSS JOIN (SELECT DISTINCT Text FROM TABLE1) d
LEFT JOIN TABLE1 t ON t.ID = i.ID AND t.Text = d.Text
WHERE t.ID IS NULL
See the demo.
If there is a case that any of the values 'FIRST', 'SECOND', 'THIRD' or 'FOURTH' is missing from the table then use this:
INSERT INTO TABLE1 (ID, Text)
SELECT i.ID, d.Text
FROM (SELECT DISTINCT ID FROM TABLE1) i
CROSS JOIN (
SELECT 'FIRST' Text UNION ALL SELECT 'SECOND'
UNION ALL SELECT 'THIRD' UNION ALL SELECT 'FOURTH'
) d
LEFT JOIN TABLE1 t ON t.ID = i.ID AND t.Text = d.Text
WHERE t.ID IS NULL
See the demo.

PostgreSQL aggregate union, intersection and set differences

I have a table of pairs to aggregate as follows:
+---------+----------+
| left_id | right_id |
+---------+----------+
| a | b |
+---------+----------+
| a | c |
+---------+----------+
And a table of values as so:
+----+-------+
| id | value |
+----+-------+
| a | 1 |
+----+-------+
| a | 2 |
+----+-------+
| a | 3 |
+----+-------+
| b | 1 |
+----+-------+
| b | 4 |
+----+-------+
| b | 5 |
+----+-------+
| c | 1 |
+----+-------+
| c | 2 |
+----+-------+
| c | 3 |
+----+-------+
| c | 4 |
+----+-------+
For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:
+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a | b | 5 | 1 | 2 | 2 |
+---------+----------+-------+--------------+-----------+------------+
| a | c | 4 | 3 | 0 | 1 |
+---------+----------+-------+--------------+-----------+------------+
What would be the best way to approach this using PostgreSQL?
UPDATE: here is a rextester link with data https://rextester.com/RWID9864
You need scalar sub-queries that do that.
The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.
To calculate the "diff", use the except operator:
SELECT p.*,
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p
I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:
select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;
Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.

Three table join where middle table has duplicate foreign keys

I am working with a database with a structure similar to the illustration below (except with more columns). Basically, each person has a unique person_id and alt_id. However, the only thing connecting table A to table C is table B, and table B has one to many rows for each person/alt_id.
I need to get rows with a person_id, their alt id and their associated shapes.
I could do this:
SELECT DISTINCT a.person_id, a.color, b.alt_id, c.shape
FROM a
JOIN b ON a.person_id = b.person_id
JOIN c ON b.alt_id = c.alt_id
However, that seems inefficient as it will take a Cartesian product of rows from B and C with the same alt_id before finally using DISTINCT to narrow the results down. What's the best/most efficient way to do this query?
Table A
+-----------+-------+
| person_id | color |
+-----------+-------+
| 10 | red |
| 11 | blue |
| 12 | green |
+-----------+-------+
Table B
+-----------+--------+
| person_id | alt_id |
+-----------+--------+
| 10 | 225 |
| 10 | 225 |
| 11 | 226 |
| 11 | 226 |
| 11 | 226 |
| 12 | 227 |
+-----------+--------+
Table C
+--------+----------+
| alt_id | shape |
+--------+----------+
| 225 | square |
| 226 | circle |
| 226 | rhombus |
| 226 | ellipse |
| 227 | triangle |
+--------+----------+
Join to (select distinct * from b) b rather than just the base table b.
SELECT
a.person_id, a.color, b.alt_id, c.shape
FROM
a
INNER JOIN (select distinct * from b) b
ON a.person_id = b.person_id
INNER JOIN c
ON b.alt_id = c.alt_id
You can get a distinct list of values from b before you do your joins.
SELECT DISTINCT a.person_id, a.color, b.alt_id, c.shape
FROM a
JOIN (Select Distinct person_id, alt_id from b) b ON a.person_id = b.person_id
JOIN c ON b.alt_id = c.alt_id
Note that because of indexes, and statistics, getting a DISTINCT list is not always a good idea. Look at the actual execution plan to evaluate how good this is, especially if you have a lot of data.
You could use aggregation along with a common table expression (or subquery, but a CTE might be neater):
WITH ab AS (
SELECT a.person_id, a.color, MAX(b.alt_id) AS alt_id
FROM a INNER JOIN b
ON a.person_id = b.person_id
GROUP BY a.person_id, a.color
)
SELECT ab.person_id, ab.color, ab.alt_id, c.shape
FROM ab INNER JOIN c ON ab.alt_id = c.alt_id;

Left Join to select all record from table1 and single (optional) record of table2

In my database i have to table with one to many(optional) relationship 1....0,*
Table1:
+--+---------+
|id| name |
+--+---------+
| 1| user1 |
| 2| user2 |
| 3| user3 |
+--+---------+
Table2
+--+------+-------+
|id|tb1_ID|city |
+--+------+-------+
| 1| 1 | a |
| 2| 1 | b |
| 3| 2 | c |
+--+------+-------+
Now here i want all present records from table 1 and Top 1 element of table2(for each table 1 row)
Like
+----+------+----+--------+---------+
|p.id|p.name|c.id|c.tb1_ID|c.city |
+----+------+----+--------+---------+
| 1 | user1| 1 | 1 | a |
| 2 | user2| 3 | 2 | c |
| 3 | user3|null| null | null |
+----+------+----+--------+---------+
HOw???
For example with this WITH common_table_expression and ROW_NUMBER function:
WITH cte AS(
SELECT t1.id AS t1ID
, t1.name
, t2.id AS t2ID
, t2.tb1_ID
, t2.city
, ROW_NUMBER()OVER(Partition By t1.id Order By t2.id)AS t1RowNum
FROM Table1 t1 LEFT OUTER JOIN Table2 t2 ON t1.id=t2.tb1_ID
)
SELECT cte.*
FROM cte
WHERE t1RowNum = 1
You'll have to use a subquery with OUTER APPLY to isolate the single row in the right table.
select t1.*, t2.*
from table1 t1
outer apply
(
select top 1 *
from table2
where tb1_id = t1.id
order by id
) as t2
Assuming you have a 1 to many relationship you want to use a LEFT OUTER JOIN:
SELECT p.id, p.name, c.id, c.tb1_ID, c.city
FROM Table1 p
LEFT OUTER JOIN Table2 c
ON p.id = c.tb1_ID
If you have a many to many you'll need to decide how you're limiting table 2.