Optimize a complex PostgreSQL Query - sql

I am attempting to make a complex SQL join on several tables: as shown below. I have included an image of the dB schema also.
Consider table_1 -
e_id name
1 a
2 b
3 c
4 d
and table_2 -
e_id date
1 1/1/2019
1 1/1/2020
2 2/1/2019
4 2/1/2019
The issue here is performance. From the tables 2 - 4 we only want the most recent entry for a given e_id but because these tables contain historical data (~ >3.5M rows) it's quite slow. I've attached an example of how we're currently trying to achieve this but it only includes one join of 'table_1' with 'table_x'. We group by e_id and get the max date for it. The other way we've thought about doing this is creating a Materialized View and pulling data from that and refreshing it after some period of time. Any improvements welcome.
from fds.region as rg
inner join (
select e_id, name, p_id
from fds.table_1
where sec_type = 'S' AND active_flag = 1
) as table_1 on table_1.e_id = rg.e_id
inner join fds.table_2 table_2 on table_2.e_id = rg.e_id
inner join fds.sec sec on sec.p_id = table_1.p_id
inner join fds.entity ent on ent.int_entity_id = sec.int_entity_id
inner join (
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4 on table_4.e_id = rg.e_id
where rg.region_str like '%US' and ent.sec_type = 'P'
order by table_2.int_price
limit 500;

You can simplify this logic:
(
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4
To:
(SELECT DISTINCT ON (int_1.e_id) int_1.*
FROM fds.table_4 int_1
ORDER BY int_1.e_id, int_1.date DESC
) table_4
This can take advantage of an index on fds.table_4(e_id, date desc) -- and might be wicked fast with such an index.
You also want appropriate indexes for the joins and filtering. However, it is hard to be more specific without an execution plan.

Related

Filtering Join in Oracle DB

Problem:
Each KEY in Table A should have one RF record and one SJ record however I have some duplicated SJ records.
Objective:
I wish to use the SJ records in Table B to identify which SJ record in Table A to keep.
Info:
Table A and Table B share a KEY and SEQ_NBR field.
Inputs:
Table A looks as follows
KEY ID_TYPE SEQ_NBR BUS_NAME
1234 RF 1 COMP_A
1234 SJ 2 COMP_B
1234 SJ 4 COMP_C
5678 RF 1 COMP_L
5678 SJ 2 COMP_M
5678 SJ 3 COMP_N
Table B looks as follows
KEY SEQ_NBR BUS_NAME
1234 2 COMP_B
5678 3 COMP_N
Desired Outcome:
My output would look as follows
KEY ID_TYPE SEQ_NBR BUS_NAME
1234 RF 1 COMP_A
1234 SJ 2 COMP_B
5678 RF 1 COMP_L
5678 SJ 3 COMP_N
Here is one way:
select key, id_type, seq_nbr, bus_name
from (
select a.*,
row_number() over (partition by a.key, a.id_type
order by b.key) as rn
from a left outer join b on a.key = b.key and a.seq_nbr = b.seq_nbr
)
where rn = 1
;
The left outer join adds columns from table b to those of table a. We need that for a single purpose: as we partition by key and id_type, we have partitions of either a single row or (two or more) rows. In the latter case, only one row has a non-null value in b.key. If we order by b.key, the row with non-null b.key will get row number = 1 (and we don't care about the rest).
Then the outer query simply keeps all the rows with row number = 1 and ignores the rest.
An alternative solution, using the union all of the two tables (slightly modified as needed) and basic aggregation using the last aggregate function:
select key, id_type,
min(seq_nbr) keep (dense_rank last order by source) as seq_nbr,
min(bus_name) keep (dense_rank last order by source) as bus_name
from (
select 'A' as source, a.* from a
union all
select 'B', key, 'SJ', seq_nbr, bus_name from b
)
group by key, id_type
;
You can test both to see which is more efficient on your data (if performance is important).
Here goes your code:
select * from tablea a
where exists
(select 1 from tableb b where b.key=a.key and b.seq_nbr=a.seq_nbr)
or not exists (select tablea.id_type from tablea inner join tableb on tablea.key=tableb.key and tablea.SEQ_NBR=tableb.SEQ_NBR and tablea.id_type=a.id_type)
If I understand correctly, you can count the number of duplicates. Then use left join and filter based on both the count and the match:
select a.*
from (select a.*,
count(*) over (partition by key, id_type) as cnt
from a
) a left join
b
on b.key = a.key and
b.seq_nbr = a.seq_nbr and
b.bus_name = a.bus_name
where cnt = 1 or b.key is not null;

Pivot Creates an extra Row for Null values of Pivot Key

I am using oracle sql developer and trying to pivot in based on IssueId and expecting to get one row. But it creates an extra row for null values of issue but I want all values in one row. Please, see below scenario for better explanation of this confusing problem.
Table I have after joins :
Data:
Current Result with Extra Null Row:
Expected Result:
My Query:
select *
from
(
select TableB.SEQ_ID,TableA.ISSUEID , TableC.Question
from TableC RIGHT JOIN TableB ON TableC.QUESTIONID = TableB.QUESTION_ID
LEFT JOIN TableA ON TableB.QUESTION_ID = TableA.QUESTIONID AND ISSUEID = 3250
) d
pivot
(
MAX(Question)
for SEQ_ID in ( 1, 2, 3 ,4, 5 )
) piv;
Any suggestions related to this is appreciated. Thank You!
Assuming the intermediate result set is what you expect to see and the left/right outer joins are necessary for what you're trying to do, you could just replace the null issue IDs from rows not matching in table A, using an analytic aggregate:
max(TableA.ISSUEID) over () as ISSUEID
With sample data that gets the same intermediate result you showed, the modified inner query:
select TableB.SEQ_ID, max(TableA.ISSUEID) over () as ISSUEID, TableC.Question
from TableC
RIGHT JOIN TableB ON TableC.QUESTIONID = TableB.QUESTION_ID
LEFT JOIN TableA ON TableB.QUESTION_ID = TableA.QUESTIONID AND TableA.ISSUEID = 3250;
gets
SEQ_ID ISSUEID QUESTION
---------- ---------- -------------
1 3250 How are You?
2 3250 Hows it going
3 3250 Is It Okay?
4 3250 Whats Up?
5 3250 Really?
and when pivoted:
select *
from
(
select TableB.SEQ_ID, max(TableA.ISSUEID) over () as ISSUEID, TableC.Question
from TableC
RIGHT JOIN TableB ON TableC.QUESTIONID = TableB.QUESTION_ID
LEFT JOIN TableA ON TableB.QUESTION_ID = TableA.QUESTIONID AND TableA.ISSUEID = 3250
) d
pivot
(
MAX(Question)
for SEQ_ID in ( 1, 2, 3 ,4, 5 )
) piv;
gets
ISSUEID 1 2 3 4 5
---------- ------------- ------------- ------------- ------------- -------------
3250 How are You? Hows it going Is It Okay? Whats Up? Really?
You could get the same result with:
select * from (
select b.seq_id, a.issueid, c.question
from tableb b
join tablec c on c.questionid = b.question_id
cross join (
select issueid from tablea where issueid = 3250
) a
) d
pivot
(
max(question)
for seq_id in (1, 2, 3 ,4, 5)
);
(or cross apply in 12c+), which also works with condition >= 3250. This just lists all five questions against every matching issue ID. That may not actually be what you want to do, but it's what your very limited sample data and expected results suggest.
If you only want valid issues, then don't mess around with all those outer joins. Just start with the table that has the rows you want to keep, us left join for the other tables, and filter the first table in the where clause:
from (select TableB.SEQ_ID, TableA.ISSUEID , TableC.Question
from TableA LEFT JOIN
TableB
on TableB.QUESTION_ID = TableA.QUESTIONID LEFT JOIN
TableC
on TableC.QUESTIONID = TableB.QUESTION_ID
where tableA.ISSUEID = 3250
) d

Union Three or more tables with conditions

I need a help to solve some problem.
I have some table levelAsignment with columns level_id, store_id and user_id. For each user_id I can write a query to get his level_ids and store_ids.
Also I have a table stores.
I need to get for each store his level and count the users of the current level and store.
It's easy, but the problem is in storing data, Because in the levelAsignment table the user can set all stores for some operator level.
It looks like this:
level_id | store_Id | user_id
4 1 5
1 5 5
6 1
when store_id = 1 in the stores table it means all stores, so I need to show all stores except 1.
select * from stores where id != 1;
so I need an advice how to organize that.
I find different ways to solve the problem, but there were many unions and conditions.
This depends on how you are able to join the stores table
I think you should join level_assignment (where the store_id = 1) with all data in the stores table, but subquery where the outer query excludes the store_id = 1 column from the level assignment table. You may have to create a join column in temporary tables for the stores data. Then union the level_assignment table where store_id != 1
Example:
WITH get_all_stores_for_store_id_1 AS (
SELECT
a.level_assigment,
a.store_id,
b.store_id,
a.user_id
FROM level_assignment a
LEFT JOIN stores b ON a.join_column = b.join_column
WHERE a.store_id = 1)
SELECT
level_assignment,
b.store_id AS store_id,
user_id
FROM get_all_stores_for_store_id_1
UNION
SELECT
level_assignment,
store_id,
user_id
FROM level_assignment
WHERE store_id != 1
Does that make sense?
Thinking about how to join the data, we could do something like this:
Get the stores table and create a 1 column with a one in every row for the stores, so that we can then join all stores to the level_assignment table with store_id = 1:
WITH set_1_column_in_stores_table AS (
SELECT
1 AS join_id,
store_id,
FROM stores),
all_store_rows_get_all_stores AS (
SELECT
a.level_assigment,
a.store_id,
b.store_id,
a.user_id
FROM level_assignment a
LEFT JOIN set_1_column_in_stores_table b ON a.store_id= b.join_id
-- The above will join all stores where store_id = 1 in level_assigment
WHERE a.store_id = 1)
SELECT
level_assignment,
b.store_id AS store_id,
user_id
FROM all_store_rows_get_all_stores
UNION
SELECT
level_assignment,
store_id,
user_id
FROM level_assignment
WHERE store_id != 1

SQL deleting records with group by multiple tables

I am trying to delete duplicate records in a table but on if they are duplicate per a record from another.
The following query gets me the number of duplicate records per 'bodyshop'.
Im trying to delete multiple invoices for each bodyshop.
SELECT
inv.InvoiceNo, job.BodyshopId, COUNT(*)
FROM
[Test].[dbo].[Invoices] as inv
join [Test].[dbo].Repairs as rep on rep.Id = inv.RepairId
join [Test].[dbo].Jobs as job on job.Id = rep.JobsId
GROUP BY
inv.InvoiceNo, job.BodyshopId
HAVING
COUNT(*) > 1
I want the duplicate invoice numbers per bodyshop to be deleted but i do want the original one to remain.
InvoiceNo BodyshopId (No column name)
29737 16 2
29987 16 3
30059 16 2
23491 139 2
23608 139 3
23867 139 4
23952 139 3
I only want invoice number 29737 to be once against bodyshopid 16 etc.
Hope that makes sense
Thanks
Perhaps this :
with cte as (
SELECT
inv.ID, inv.InvoiceNo, job.BodyshopId, rn = row_number() over (partition by inv.InvoiceNo, job.BodyshopId order by inv.InvoiceNo, job.BodyshopId)
FROM
[Test].[dbo].[Invoices] as inv
join [Test].[dbo].Repairs as rep on rep.Id = inv.RepairId
join [Test].[dbo].Jobs as job on job.Id = rep.JobsId
)
delete t1
from [Test].[dbo].[Invoices] t1 inner join cte t2 on t1.ID = t2.ID
where t2.rn > 1
Edit 1 - Your comments are trues. So a solution is to add an identity column to the invoice table. I've adapt my query.
To add / remove an identity column :
alter table [Test].[dbo].[Invoices] id int identity(1,1)
drop column id
You may run the following as two records are same so, Group by will return single row for same invoice:
DELETE FROM inv where id not in (
SELECT Max(inv.id) FROM (
SELECT
inv.id, inv.InvoiceNo, job.BodyshopId, COUNT(*)
FROM
[Test].[dbo].[Invoices] as inv
join [Test].[dbo].Repairs as rep on rep.Id = inv.RepairId
join [Test].[dbo].Jobs as job on job.Id = rep.JobsId
GROUP BY
inv.InvoiceNo, job.BodyshopId
HAVING
COUNT(*) > 1
) TMP_TABLE )
id is the primary key.
General SQL. Modify if needed for sql-server.

pad database out with NULL criteria

If I have the following sample table (order by ID)
ID Date Type
-- ---- ----
1 01/01/2000 A
2 22/04/1995 A
2 14/02/2001 B
Where you can immediate see that ID=1 does not have a Type=B, but ID=2 does. What I want to do, if fill in a line to show this:
ID Date Type
-- ---- ----
1 01/01/2000 A
1 NULL B
2 22/04/1995 A
2 14/02/2001 B
where there could potentially be 100's of different types, (so may need to end up inserting 100's rows per person if they lack 100's Types!)
Is there a general solution to do this?
Could I possibly outer join the table on itself and do it that way?
You can do this with a cross join to generate all the rows and a left join to get the actual data values:
select i.id, s.date, t.type
from (select distinct id from sample) i cross join
(select distinct type from sample) t left join
sample s
on s.id = i.id and
s.type = t.type;