Performance tuning of Oracle SQL query - sql

Can someone help me out to tune this query? It's taking 1 minute time to return the data in sqldeveloper.
SELECT
masterid, notification_id, notification_list, typeid,
subject, created_at, created_by, approver, sequence_no,
productid, statusid, updated_by, updated_at, product_list,
notification_status, template, notification_type, classification
FROM
(
SELECT
masterid, notification_id, notification_list, typeid, subject,
approver, created_at, created_by, sequence_no, productid,
statusid, updated_by, updated_at, product_list, notification_status,
template, notification_type, classification,
ROW_NUMBER() OVER(ORDER BY masterid DESC)AS r
FROM
(
SELECT DISTINCT
a.masterid AS masterid,
a.maxid AS notification_id,
notification_list,
typeid,
noti.subject AS subject,
noti.approver AS approver,
noti.created_at AS created_at,
noti.created_by AS created_by,
noti.sequence_no AS sequence_no,
a.productid AS productid,
a.statusid AS statusid,
noti.updated_by AS updated_by,
noti.updated_at AS updated_at,
(
SELECT LISTAGG(p.name,',') WITHIN GROUP(ORDER BY p.id) AS list_noti
FROM product p
INNER JOIN notification_product np ON np.product_id = p.id
WHERE notification_id = a.maxid
) AS product_list,
(
SELECT description
FROM notification_status
WHERE id = a.statusid
) AS notification_status,
(
SELECT name
FROM template
WHERE id = a.templateid
) AS template,
(
SELECT description
FROM notification_type
WHERE id = a.typeid
) AS notification_type,
(
SELECT tc.description
FROM template_classification tc
INNER JOIN notification nt ON tc.id = nt.classification_id
WHERE nt.id = a.maxid
) AS classification
FROM
(
SELECT
nm.id AS masterid,
nm.product_id AS productid,
nm.notification_status_id AS statusid,
nm.template_id AS templateid,
nm.notification_type_id AS typeid,
(
SELECT MAX(id)
FROM notification
WHERE notification_master_id = nm.id
) AS maxid,
(
SELECT LISTAGG(n.id,',') WITHIN GROUP(ORDER BY nf.id) AS list_noti
FROM notification n
WHERE notification_master_id = nm.id
) AS notification_list
FROM notification_master nm
INNER JOIN notification nf ON nm.id = nf.notification_master_id
WHERE nm.disable = 'N'
ORDER BY nm.id DESC
) a
INNER JOIN notification noti
ON a.maxid = noti.id
AND
(
(
(
TO_DATE('01-jan-1970','dd-MM-YYYY') +
numtodsinterval(created_at / 1000,'SECOND')
) <
(current_date + INTERVAL '-21' DAY)
)
OR (typeid exists(2,4) AND statusid = 4)
)
)
)
WHERE r BETWEEN 11 AND 20

DISTINCT is very often an indicator for a badly written query. A normalized database doesn't contain duplicate data, so where do the duplicates suddenly come from that you must remove with DISTINCT? Very often it is your own query producing these. Avoid producing duplicates in the first place, so you don't need DISTINCT later.
In your case you are joining with the table notification in your subquery a, but you are not using its rows in that subquery; you only select from notification_master_id.
After all, you want to get notification masters, get their latest related notification (by getting its ID first and then select the row). You don't need hundreds of subqueries to achieve this.
Some side notes:
To get the description from template_classification you are joining again with the notification table, which is not necessary.
ORDER BY in a subquery (ORDER BY nm.id DESC) is superfluous, because subquery results are per standard SQL unsorted. (Oracle violates this standard sometimes in order to apply ROWNUM on the result, but you are not using ROWNUM in your query.)
It's a pity that you store created_at not as a DATE or TIMESTAMP, but as a number. This forces you to calculate. I don't think this has a great impact on your query, though, because you are using it in an OR condition.
CURRENT_DATE gets you the client date. This is rarely wanted, as you select data from the database, which should of course not relate to some client's date, but to its own date SYSDATE.
If I am not mistaken, your query can be shortened to:
SELECT
nm.id AS masterid,
nf.id AS notification_id,
nfagg.notification_list AS notification_list,
nm.notification_type_id AS typeid,
nf.subject AS subject,
nf.approver AS approver,
nf.created_at AS created_at,
nf.created_by AS created_by,
nf.sequence_no AS sequence_no,
nm.product_id AS productid,
nm.notification_status_id AS statusid,
nf.updated_by AS updated_by,
nf.updated_at AS updated_at,
(
SELECT LISTAGG(p.name, ',') WITHIN GROUP (ORDER BY p.id)
FROM product p
INNER JOIN notification_product np ON np.product_id = p.id
WHERE np.notification_id = nf.id
) AS product_list,
(
SELECT description
FROM notification_status
WHERE id = nm.notification_status_id
) AS notification_status,
(
SELECT name
FROM template
WHERE id = nm.template_id
) AS template,
(
SELECT description
FROM notification_type
WHERE id = nm.notification_type_id
) AS notification_type,
(
SELECT description
FROM template_classification
WHERE id = nf.classification_id
) AS classification
FROM notification_master nm
INNER JOIN
(
SELECT
notification_master_id,
MAX(id) AS maxid,
LISTAGG(id,',') WITHIN GROUP (ORDER BY id) AS notification_list
FROM notification
GROUP BY notification_master_id
) nfagg ON nfagg.notification_master_id = nm.id
INNER JOIN notification nf
ON nf.id = nfagg.maxid
AND
(
(
DATE '1970-01-01' + NUMTODSINTERVAL(nf.created_at / 1000, 'SECOND')
< CURRENT_DATE + INTERVAL '-21' DAY
)
OR (nm.notification_type_id IN (2,4) AND nm.notification_status_id = 4)
)
WHERE nm.disable = 'N'
ORDER BY nm.id DESC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;
As mentioned, you may want to replace CURRENT_DATE with SYSDATE.
I recommend the following indexes for the query:
CREATE INDEX idx1 ON notification_master (disable, id, notification_status_id, notification_type_id);
CREATE INDEX idx2 ON notification (notification_master_id, id, created_at);
A last remark on paging: In order to skip n rows to get the next n, the whole query must get executed for all data and then all result rows be sorted only to pick n of them at last. It is usually better to remember the last fetched ID and then only select rows with a higher ID in the next execution.

Related

How to simplify multiple CTE

I have several similar CTE, actually 9. The difference is in the WHERE clause from the subquery on the column for.
WITH my_cte_1 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 1)
),
WITH my_cte_2 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 2)
),
WITH my_cte_3 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 3)
)
SELECT
'History' AS "Indic",
(SELECT count(DISTINCT(id)) FROM my_cte_1 ) AS "cte1",
(SELECT count(DISTINCT(id)) FROM my_cte_2 ) AS "cte2",
(SELECT count(DISTINCT(id)) FROM my_cte_3 ) AS "cte3",
My database is read only so I can't use function.
Each CTE process a large record of data.
Is there a way, where I can setup a parameter for the column for or a workaround ?
I'm assuming a little bit here, but I would think something like this would work:
with cte as (
SELECT
h.id, h."time",
LEAD(h."time",1) OVER (PARTITION BY h.id ORDER BY h.id, h."time") next_time,
r.for
FROM
history h
join req r on
r.type = 'sup' and
h.id = r.id and
r.for between 1 and 3
)
select
'History' AS "Indic",
count (distinct id) filter (where for = 1) as cte1,
count (distinct id) filter (where for = 2) as cte2,
count (distinct id) filter (where for = 3) as cte3
from cte
This would avoid multiple passes on the various tables and should run much quicker unless these are highly selective values.
Another note... the "lead" analytic function doesn't appear to be used. If this is really all there is to your query, you can omit that and make it run a lot faster. I left it in assuming it had some other purpose.

Select rows with max(id) having max(payment_date)

I've been trying to get the max(id) for the max(payment_date) of every account_id, as there are instances where there's different entries for the same max(payment_date). The ids are the payment references for the account_ids. So every account_id needs to have one entry with the max(payment_date) and the max(id) for that date. Problem is that there are entries where the max(id) for the account_id is not for the max(payment_date), or I would have just used max(id). The code below is not working because of this, since it will exclude entries where the max(id) is not for the max(payment_date). Thanks in advance.
select *
from (
select payments.*
from (
select account_id, max(payment_date) as last_payment, max(id) as last_payment1
from energy.payments
where state = 'success'
and amount_pennies > 0
and description not ilike '%credit%'
group by account_id
) as last_payment_table
inner join energy.payments as payments
on payments.account_id = last_payment_table.account_id
and payments.payment_date = last_payment_table.last_payment
and payments.id = last_payment_table.last_payment1
) as paymentst1
Use distinct on. I can't really follow your query (sample data is such a big help!) But the idea is:
select distinct on (p.account_id) p.*
from energy.payments p
order by p.account_id, p.payment_date desc, p.id desc;
You can add additional logic for filtering or whatever. That logic is not explained in your question but is suggested by the code you've included.
It is hard to understand the question, but I think you mean this:
SELECT *
FROM payments p
WHERE NOT EXISTS (
SELECT *
FROM payments nx
WHERE nx.account_id = p.account_id -- same account
AND nx.payment_date >= p.payment_date -- same or more recent date
AND nx.id > p.id -- higher ID
);
Or, using a window function:
select *
from (
select *
, row_number() OVER(PARTITION BY account_id
ORDER BY payment_date DESC,id DESC) as rn
from energy.payments
where state = 'success'
and amount_pennies > 0
and description not ilike '%credit%'
) x
WHERE x.rn=1
;

How to find duplicate records in PostgreSQL

I have a PostgreSQL database table called "user_links" which currently allows the following duplicate fields:
year, user_id, sid, cid
The unique constraint is currently the first field called "id", however I am now looking to add a constraint to make sure the year, user_id, sid and cid are all unique but I cannot apply the constraint because duplicate values already exist which violate this constraint.
Is there a way to find all duplicates?
The basic idea will be using a nested query with count aggregation:
select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1
You can adjust the where clause in the inner query to narrow the search.
There is another good solution for that mentioned in the comments, (but not everyone reads them):
select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1
Or shorter:
SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1
From "Find duplicate rows with PostgreSQL" here's smart solution:
select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
FROM tbl
) dups
where
dups.Row > 1
In order to make it easier I assume that you wish to apply a unique constraint only for column year and the primary key is a column named id.
In order to find duplicate values you should run,
SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);
Using the sql statement above you get a table which contains all the duplicate years in your table. In order to delete all the duplicates except of the the latest duplicate entry you should use the above sql statement.
DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;
You can join to the same table on the fields that would be duplicated and then anti-join on the id field. Select the id field from the first table alias (tn1) and then use the array_agg function on the id field of the second table alias. Finally, for the array_agg function to work properly, you will group the results by the tn1.id field. This will produce a result set that contains the the id of a record and an array of all the id's that fit the join conditions.
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id;
Obviously, id's that will be in the duplicate_entries array for one id, will also have their own entries in the result set. You will have to use this result set to decide which id you want to become the source of 'truth.' The one record that shouldn't get deleted. Maybe you could do something like this:
with dupe_set as (
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id
order by tn1.id asc)
select ds.id from dupe_set ds where not exists
(select de from unnest(ds.duplicate_entries) as de where de < ds.id)
Selects the lowest number ID's that have duplicates (assuming the ID is increasing int PK). These would be the ID's that you would keep around.
Inspired by Sandro Wiggers, I did something similiar to
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id
FROM ordered
WHERE rnk > 1
)
DELETE
FROM user_links
USING to_delete
WHERE user_link.id = to_delete.id;
If you want to test it, change it slightly:
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id,year,user_id,sid, cid
FROM ordered
WHERE rnk > 1
)
SELECT * FROM to_delete;
This will give an overview of what is going to be deleted (there is no problem to keep year,user_id,sid,cid in the to_delete query when running the deletion, but then they are not needed)
In your case, because of the constraint you need to delete the duplicated records.
Find the duplicated rows
Organize them by created_at date - in this case I'm keeping the oldest
Delete the records with USING to filter the right rows
WITH duplicated AS (
SELECT id,
count(*)
FROM products
GROUP BY id
HAVING count(*) > 1),
ordered AS (
SELECT p.id,
created_at,
rank() OVER (partition BY p.id ORDER BY p.created_at) AS rnk
FROM products o
JOIN duplicated d ON d.id = p.id ),
products_to_delete AS (
SELECT id,
created_at
FROM ordered
WHERE rnk = 2
)
DELETE
FROM products
USING products_to_delete
WHERE products.id = products_to_delete.id
AND products.created_at = products_to_delete.created_at;
Following SQL syntax provides better performance while checking for duplicate rows.
SELECT id, count(id)
FROM table1
GROUP BY id
HAVING count(id) > 1
begin;
create table user_links(id serial,year bigint, user_id bigint, sid bigint, cid bigint);
insert into user_links(year, user_id, sid, cid) values (null,null,null,null),
(null,null,null,null), (null,null,null,null),
(1,2,3,4), (1,2,3,4),
(1,2,3,4),(1,1,3,8),
(1,1,3,9),
(1,null,null,null),(1,null,null,null);
commit;
set operation with distinct and except.
(select id, year, user_id, sid, cid from user_links order by 1)
except
select distinct on (year, user_id, sid, cid) id, year, user_id, sid, cid
from user_links order by 1;
except all also works. Since id serial make all rows unique.
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1;
So far works nulls and non-nulls.
delete:
with a as(
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1)
delete from user_links using a where user_links.id = a.id returning *;

Optimizing A Slow Complicated Remote SQL Query

I find myself needing to retrieve matches for, on average, ~1.5m rows from a remote database. There are two tables (ITEM1 and ITEM2) that have dated item information. There should always be at least one record for an item in ITEM1, and there may be 0 to many records for the same item in ITEM2. I have to find the latest record from either table, and if it exists in ITEM2, use that information instead of ITEM1. #TEMPA is the table that has the initial ~1.5m ItemNumbers.
Below is the query:
SELECT GETDATE() AS DateElement, A.SourceStore, COALESCE(FR.original_cost,CO.original_cost) AS Cost
FROM #TEMPA A
INNER JOIN REMOTEDB.ITEM1 CO
ON CO.item_id = A.ItemNumber
AND CO.month_ending >= (SELECT MAX(month_ending) FROM REMOTEDB.ITEM1 CO2 WHERE CO2.item_id = A.ItemNumber)
LEFT JOIN REMOTEDB.ITEM2 FR
ON FR.item_id = A.ItemNumber
AND FR.month_ending >= (SELECT MAX(month_ending) FROM REMOTEDB.ITEM2 FR2 WHERE FR2.item_id = A.ItemNumber)
WHERE CO.item_id IS NOT NULL
OR FR.item_id IS NOT NULL
There are unique clustered indexes on item_id and month_ending on both ITEM tables. I realize the subqueries are probably a big performance hit, but I can't think of any other way to do it. Each item could potentially have a different max month_ending date. Currently it returns the correct information, but it takes ~2.6 hrs to do so. Any help in optimizing this query to perform better would be appreciated.
Edit: I should mention the query is also being run READ UNCOMMITTED already.
I tried both answer queries using ROW_NUMBER and they both ran in ~20 minutes on the remote server itself. Using my original query it finishes in ~2 minutes.
My original query runs in ~17 minutes over linked server. I cancelled the other queries once they went over an hour.
Thoughts?
Answer Queries:
http://content.screencast.com/users/CWhittem/folders/Jing/media/ed55352b-9799-4dec-94f0-764e2670884f/2014-07-09_0957.png
Original Query:
http://content.screencast.com/users/CWhittem/folders/Jing/media/4991aa7d-a05c-4fb1-afad-52b07f896d5e/2014-07-09_1014.png
Thanks!
Rewrite the Correlated Subqueries using MAX with ROW_NUMBERs:
SELECT GETDATE() AS DateElement, A.SourceStore,
COALESCE(FR.original_cost,CO.original_cost) AS Cost
FROM #TEMPA A
INNER JOIN
(
SELECT *
FROM
(
SELECT original_cost,
item_id,
ROW_NUMBER() OVER (PARTITIOM BY item_id ORDER BY month_ending DESC) AS rn
FROM REMOTEDB.ITEM1
) as dt
WHERE rn = 1
) AS CO
ON CO.item_id = A.ItemNumber
LEFT JOIN
(
SELECT *
FROM
(
SELECT original_cost,
item_id,
ROW_NUMBER() OVER (PARTITIOM BY item_id ORDER BY month_ending DESC) AS rn
FROM REMOTEDB.ITEM2
) as dt
WHERE rn = 1
) as FR
ON FR.item_id = A.ItemNumber
If it is SQL Server 2008 or newer, try this...
;With OrderedItem1 As
(
Select Row_Number() Over (Partition By item_id Order By Month_Ending Desc) As recentOrderID,
item_id,
original_cost
From REMOTEDB.ITEM1
), OrderedItem2 As
(
Select Row_Number() Over (Partition By item_id Order By Month_Ending Desc) As recentOrderID,
item_id,
original_cost
From REMOTEDB.ITEM2
), maxItem1 As
(
Select item_id,
original_cost
From OrderedItem1
Wher recentOrderID = 1
), maxItem2 As
(
Select item_id,
original_cost
From OrderedItem2
Wher recentOrderID = 1
)
Select GetDate() As DateElement,
A.SourceStore,
IsNull(FR.original_cost,CO.original_cost) As Cost
From #TEMPA As A
Join maxItem1 As CO
On CO.item_id = A.ItemNumber
Left Join maxItem2 FR
On FR.item_id = A.ItemNumber
... you mention in the original post that there will always be a record for every item in ITEM1 so your WHERE CO.item_id Is Not Null OR FR.item_id Is Not Null does nothing (on top of the fact you would filter them out with your inner join).
So after much testing and experimentation I have come up with the following that outperforms everything else I have tried:
SELECT DISTINCT oInv.Item_ID, oInv.Month_Ending, oInv.Original_Cost
FROM (
SELECT Item_ID, Month_Ending, Original_Cost
FROM ho_data.dbo.CO_Ho_Inven
UNION ALL
SELECT Item_ID, Month_Ending, Original_Cost
FROM ho_data.dbo.FR_Ho_Inven
) OInv
INNER JOIN (
SELECT UInv.Item_ID, MAX(UInv.Month_ending) AS Month_Ending, MAX(original_cost) AS original_cost
FROM (
SELECT Item_ID, Month_Ending, original_cost
FROM ho_data.dbo.CO_Ho_Inven
UNION ALL
SELECT Item_ID, Month_Ending, original_cost
FROM ho_data.dbo.FR_Ho_Inven
) UInv
GROUP BY UInv.Item_ID
) UINv
ON OInv.Item_ID = UInv.Item_ID
AND OInv.Month_Ending = UInv.Month_Ending
AND OInv.original_cost = UINv.original_cost

Select all categories and all subcategories (in each category)

I have 2 tables:
1) REPORT ( ID, CLIENT_ID, STOP_TIME )
2) REPORT_DETAILS ( ID, REPORT_ID, CLIENT_ID, PRODUCT_ID )
I need to select all pairs (CUSTOMER_ID, PRODUCT_ID) where STOP_TIME is the biggest.
BUT! There can be couple Reports for one Customer contains same Product and this is the point...
My idea (i don't wont you to do my homework...i need just advice, some direction where to look):
WITH temp_table (client_id, product_id, report_id, stop_time) AS(
SELECT distinct(rd.CLIENT_ID), rd.REPORT_ID, rd.PRODUCT_ID, r.STOP_TIME
FROM REPORT_DETAILS rd
JOIN REPORT r
ON (r.ID = rd.REPORT_ID)
)
SELECT client_id, product_id,stop_time...
where time is max? i dont know
;WITH temp_table
(client_id, product_id, report_id, stop_time)
AS
(
SELECT distinct(rd.CLIENT_ID), rd.REPORT_ID, rd.PRODUCT_ID, MaXTime
FROM REPORT_DETAILS rd
INNER JOIN (SELECT ReportId,MAX(STOP_TIME)As MaXTime
FROM REPORT
GROUP BY ReportId) r
ON r.ID = rd.REPORT_ID
)
SELECT client_id, product_id,stop_time FROM temp_table