Postgres SQL - Efficiently writing a multistep join query in psql

Postgres SQL - Efficiently writing a multistep join query in psql - sql

I am trying to write a multi-step join query for a few time series tables in psql.
The simplified tables look like the below, whereas in reality they can contain thousands of unique lookup_ids and thousands of unique dates per lookup_id:
table_a =
lookup_id
date
a_value
A
2021-01-03
50
A
2021-01-04
51
A
2021-01-05
52
table_b =
lookup_id
date_1
date_2
b_value
A
2021-01-03
2021-01-05
5
table_c =
lookup_id
date
c_value
A
2015-01-01
A_1
A
2021-01-05
A_2
The final joined table should like like the below:
table_joined =
lookup_id
date
a_value
b1_value
b2_value
c_value
A
2021-01-03
50
5
0
A_1
A
2021-01-04
51
0
0
A_1
A
2021-01-05
52
0
5
A_2
I am stuck on two parts: 1) From table_b getting the same one-to-one 'bvalue' column joined on two separate data columns (date_1 and date_2) and using a different name for each 'b1_value' and 'b2_value', and 2) getting a functionality similar to an excel xlookup exact match or next largest item for the dates in table_c as a one to many match on lookup_id and date.
For part 1, I have tried:
SELECT
table_a.lookup_id,
table_a.date,
table_a.a_value,
table_b.b_value
FROM
table_a
LEFT JOIN table_b
ON table_a.lookup_id = table_b.lookup_id
AND table_a.date = table_b.date_1
LEFT JOIN table_b
ON table_a.lookup_id = table_b.lookup_id
AND table_a.date = table_b.date_2
WHERE
table_a.lookup_id in ('A');
And for part 2 I haven't found a great place to start.

Related

Selecting max value on subset of data based on other column's value

I'm looking to left join a value from a subset of data from another table, based on a specific value from the first table. Here are example tables:
table1
-----------------
key date
1 2020-01-02
2 2020-03-02
table2
-----------------
key date value
1 2019-12-13 a
1 2019-12-29 b
1 2020-01-14 c
1 2020-02-02 d
2 2019-11-01 e
2 2019-12-02 f
2 2020-04-29 g
Based on the value of date for a specific key value from table1, I want to select the most recent (MAX(date)) from table2, where temp contains all rows for that key value where date is on or before the date from table1.
So, the resulting table would look like this:
key date value
1 2020-01-02 b
2 2020-03-02 f
I'm thinking I could use some type of logic that would create temp tables for each key value where temp.date <= table1.date, then select MAX(temp.date) from the temp table and left join the value. For example, the temp table for key = 1 would be:
date value
1 2019-12-13 a
1 2019-12-29 b
Then it would left join the value b for key = 1, since MAX(date) = 2019-12-29. I'm not sure if this is the right logic to go about my problem; any help would be greatly appreciated!

You can use a correlated subquery:
select t1.*,
(select t2.value
from table2 t2
where t2.key = t1.key and t2.date <= t1.date
order by t2.date desc
fetch first 1 row only
) as value
from table1 t1;
Note that not all databases support the standard fetch first clause. You may need to use limit or select top (1) or something else depending on your database.

Full Join And Distribute Evenly Based on Count in SQL

I have two tables that I want to join:
Table A
Date Gran1 Gran2 Gran3
1/1/18 A B CD
1/1/18 A B EF
1/2/18 A B GF
1/2/18 A B EF
1/2/18 A B FR
1/2/18 A L EF
Table B
Date Gran1 Gran2 Value1 Value2
1/1/18 A B 100 150
1/2/18 A B 200 80
1/2/18 A L 500 30
Table B does not have the same granularity as Table A. I want to join Table B to Table A and distribute the Values I am joining by the count of occurences Date, Gran1, and Gran2 occur.
My final result should look like this:
Date Gran1 Gran2 Gran3 Value1 Value2
1/1/18 A B CD 50 75
1/1/18 A B EF 50 75
1/2/18 A B GF 66.67 26.67
1/2/18 A B EF 66.67 26.67
1/2/18 A B FR 66.67 26.67
1/2/18 A L EF 500 30
Any help would be great, thanks!

You can try this query
Select a1.date1,
a1.gran1,
a1.gran2,
a1.gran3,
(b.value1/a2.xCount) as value1,
(b.value2/a2.xCount) as value2
from #tableA A1
Inner join #tableB B on A1.date1 = B.date1
and a1.gran1 = b.gran1
and a1.gran2 = b.gran1
inner join (select date1, gran1, gran2, count(*) xCount
from #tableA
group by date1, gran1, gran2) A2 on A1.date1 = A2.date1
and a1.gran1 = a2.gran1
and a1.gran2 = a2.gran2

Would this query work?
SELECT a.date,
a.gran1,
a.gran2,
a.gran3,
b.value1/gran_count AS value1,
b.value2/gran_count AS value2
FROM table_a a
INNER JOIN table_b b
ON (a.date = b.date
AND a.gran1 = b.gran1
AND a.gran2 = b.gran2)
INNER JOIN (
SELECT date, gran1, gran2, count(*) AS gran_count
FROM table_a
GROUP BY date, gran1, gran2
) sub ON (a.date = sub.date
AND a.gran1 = sub.gran1
AND a.gran2 = sub.gran2);
I don't have access to SQL right now so I can't verify the data (or even if I've missed something in the syntax), but I'm attempting to find the COUNT of each combination of date/gran1/gran2 in a subquery, and then use that in a division in the main query.
Both table_b and the subquery (which I've called "sub") are joined to table_a.
There may be a more efficient way of doing this query using two joins instead of three, but nothing comes to mind right now.

Conditional Join in Oracle SQL

Consider below 3 tables.
Table a
Col a Col b Col c
1 000 Actual data
1 001 Actual data
2 000 Actual data
3 000 Actual data
3 001 Actual data
3 002 Actual data
Table b
Col a Col b Col d
1 000 Actual data
1 001 Actual data
2 000 Actual data
Table c
Col a Col b Col d
3 000 Actual data
3 001 Actual data
3 002 Actual data
Table a is parent table and table b and c are child table having col a & b common among 3 and needs to be joined.
Now Join should be such if data is not found in table b then only it should be searched in table c
Desired:
cola col b col c col d
1 000 somedata moredata
1 001 somedata moredata
2 000 somedata moredata
3 000 somedata moredata
3 001 somedata moredata
3 002 somedata moredata
Well, currently what i am doing is, left join b to a and c to a, but i think every time for record in a will be searched in b and c both making it Less cost effective. hence want to make it cost effective/fine-tune such that if records NOT exist in b then only search c.

What you really need is a way to "collect" all the rows from table B, and if there are none, then all the rows from table C. Doing the join to A is then standard.
Something like this should work. Make it a subquery and join to your first table.
select col_a, col_b, col_c
from table_b
union all
select col_a, col_b, col_c
from table_c
where (select count(*) from table_b) = 0
If table_b has at least one row, then nothing will be selected from table_c (because the where condition will be false for all rows in table_c). However, if table_b is empty, all the rows from table_c will be selected.

What you need to do is first create a union of two tables B and C with only those records where are in B and C but if they are in B then we should ignore the C ones then do a join with Table A. Thus:
SELECT B.cola, B.colb from B
UNION ALL
SELECT C.cola, C.colb from C
Now using this table, you can join with Table A like:
SELECT A.cola, A.colb, tmp.colc
FROM A
JOIN
( SELECT B.cola, B.colb, B.colc from B
UNION ALL
SELECT C.cola, C.colb from C) AS tmp
ON A.cola = tmp.cola
AND A.colb = tmp.colb

Two left joins:
select a.*, b.*, c.*
from a
left join b
on a.cola=b.cola
and a.colb = b.colb
left join c
on a.cola=c.cola
and a.colb=c.colb

Create View with data (not repeated) from 2 tables with the same structure

In a oracle database, I have 2 tables with the same structure (same columns). One is being migrated to the other. The thing is that I need to create a view that reads records from the 2 tables so that during migration it's possible to read all records. In case there are repeated records, only the ones from table 1 should be displayed in the view.
Table 1
USER_ID START_DATE END_DATE
1 2015-08-12 2015-12-08
2 2015-02-25 2015-06-01
3 2015-04-14 2015-09-21
Table 2
USER_ID START_DATE END_DATE
2 2015-02-25 2015-06-01
4 2015-12-20 2016-01-13
The view should contain the following data:
USER_ID START_DATE END_DATE
1 2015-08-12 2015-12-08
2 2015-02-25 2015-06-01
3 2015-04-14 2015-09-21
4 2015-12-20 2016-01-13
Is this possible?
Thanks!

If when you say repeated records you mean that all 3 columns are the same, than I don't understand why you want them from table a since they are the same.
In addition, you can just create view as:
select * from table1
union
select * from table2
that will eliminate all duplicates and basically will keep those from table1 since its the first table(although it doesn't matter)
If you are stubborn on doing it like you said maybe because not all the columns needs to be the same.
then what you need is a full outer join
SELECT case when a.id is not null then a.id else b.id end as id,
case when a.id is not null then a.start_date else b.start_date end as start_date,
case when a.id is not null then a.end_date else b.end_date as end_Date
from table a full outer join table2 b on (a.id = b.id)

This is the answer I was looking for, less verbose than the answer from sagi
select *
from table1
union all
select *
from table2 a
where not exists (select null from table1 where user_id = a.user_id);

Summing columns in two tables then joining tables

I have two tables set up in the same way, but with different values. Here are samples from each:
Table1:
Date Code Count
1/1/2015 AA 4
1/3/2015 AA 2
1/1/2015 AB 3
Table2:
Date Code Count
1/1/2015 AA 1
1/2/2015 AA 0
1/4/2015 AB 2
I would like the result table to contain all unique date-code pairs, with any duplicates between the tables having the counts of the two summed.
Output_Table:
Date Code Count
1/1/2015 AA 5 /*Summed because found in Table1 and Table2*/
1/2/2015 AA 0
1/3/2015 AA 2
1/1/2015 AB 3
1/4/2015 AB 2
I have no primary key to connect the two tables, and the joins I have tried have either not kept all distinct date-code pairs or have created duplicates.
For reference, I am doing this inside a SAS proc sql statement.

I'm on the road at the moment so I haven't run this, but:
SELECT date ,
code ,
SUM([count])
FROM ( SELECT *
FROM table1
UNION ALL
SELECT *
FROM table2
) [tables]
GROUP BY date ,
code
ORDER BY date ,
code
Will do the trick. I'll have a crack at the join version and edit this post when I get in front of a proper computer
EDIT:
Full outer joins and COALESCE will also do it, although is marginally slower, so It may depend on what else you have going on there!
SELECT COALESCE(#table1.date, #table2.date) ,
COALESCE(#table1.code, #table2.code) ,
ISNULL(#table1.COUNT, 0) + ISNULL(#table2.COUNT, 0)
FROM #table1
FULL OUTER JOIN #table2 ON #table2.code = #table1.code
AND #table2.date = #table1.date
ORDER BY COALESCE(#table1.date, #table2.date) ,
COALESCE(#table1.code, #table2.code)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres SQL - Efficiently writing a multistep join query in psql - sql

Related

Selecting max value on subset of data based on other column's value

Full Join And Distribute Evenly Based on Count in SQL

Conditional Join in Oracle SQL

Create View with data (not repeated) from 2 tables with the same structure

Summing columns in two tables then joining tables

Categories

Resources