Oracle: Delete statement becomes very slow when using variable - sql

I have an Oracle procedure that normally takes around 5 minutes to run. There are two areas in the procedure where I need to query a list of id numbers that may have been modified. The query in these two segments is identical. The first is at the beginning of a cursor definition, and the second is in a delete statement in the main body of the procedure. The key part that is repeated a lot involves getting the most recent date a certain table was refreshed. The entire procedure is fairly lengthy, so I'll include only the relevant bits below.
Here is the beginning of the cursor:
cursor gvg_profile_c is
with last_refreshed as
(select trunc(max(t.last_refreshed)) as max_date
from ad.pbi_gvg_profile t),
with modified_ids as
(
select distinct id_number
from ad.hr_giving cg
join ad.pbi_dates d
on d.DATE_FULL = trunc(cg.processed_date)
where d.RELATIVE_DATE >= (select max_date from last_refreshed)
and d.RELATIVE_DATE <= trunc(CURRENT_DATE)
and cg.fiscal_year >= ad.current_fiscal_year - 6
union
select distinct gi.gift_donor_id as id_number
from ad.gift gi
where gi.date_added >= (select max_date from last_refreshed)
or gi.date_modified >= (select max_date from last_refreshed)
union
select distinct p.pledge_donor_id as id_number
from ad.pledge_rev p
where p.date_added >= (select max_date from last_refreshed)
or p.date_modified >= (select max_date from last_refreshed)
union
select distinct a.id_number
from ad.affiliation a
where a.date_added >= (select max_date from last_refreshed)
or a.date_modified >= (select max_date from last_refreshed)
),
Here is the delete statment:
delete from ad.pbi_gvg_profile p
where p.id_number in
(with last_refreshed as (select trunc(max(t.last_refreshed)) as max_date
from ad.pbi_gvg_profile t)
select distinct id_number
from ad.hr_giving cg
join ad.pbi_dates d
on d.DATE_FULL = trunc(cg.processed_date)
where d.RELATIVE_DATE >= (select max_date from last_refreshed)
and d.RELATIVE_DATE <= trunc(CURRENT_DATE)
and cg.fiscal_year >= ad.current_fiscal_year - 6
union
select distinct gi.gift_donor_id as id_number
from ad.gift gi
where gi.date_added >= (select max_date from last_refreshed)
or gi.date_modified >= (select max_date from last_refreshed)
union
select distinct p.pledge_donor_id as id_number
from ad.pledge_rev p
where p.date_added >= (select max_date from last_refreshed)
or p.date_modified >= (select max_date from last_refreshed)
union
select distinct a.id_number
from ad.affiliation a
where a.date_added >= (select max_date from last_refreshed)
or a.date_modified >= (select max_date from last_refreshed)
);
I thought it would make more sense and be more efficient to a have a function that calculates the last_refreshed date and puts it in a variable that can be reused. So I did that. Here's the function:
NOTE: You can see here that I sometimes need to get the date from a different table. That is accounted for for the above code via an if/else, I just left it out so you don't have to read identical code 4 times.
function get_last_refreshed_date(scope in smallint) return date is
last_refreshed date;
table_name varchar2(30);
select_sql varchar2(255);
begin
if scope = c_scope_all then
table_name := 'ad.pbi_gvg_profile';
else
table_name := 'ad.pbi_gvg_profile_ag';
end if;
select_sql := 'select trunc(max(last_refreshed))
from '||table_name;
execute immediate select_sql
into last_refreshed;
return(last_refreshed);
end get_last_refreshed_date;
Then I change the beginning of my cursor to use this function and a variable instead, so now it looks like this:
last_refreshed date := get_last_refreshed_date(scope);
cursor gvg_profile_c is
with modified_ids as
(
select distinct id_number
from ad.hr_giving cg
join ad.pbi_dates d
on d.DATE_FULL = trunc(cg.processed_date)
where d.RELATIVE_DATE >= last_refreshed
and d.RELATIVE_DATE <= trunc(CURRENT_DATE)
and cg.fiscal_year >= ad.current_fiscal_year - 6
union
select distinct gi.gift_donor_id as id_number
from ad.gift gi
where gi.date_added >= last_refreshed
or gi.date_modified >= last_refreshed
union
select distinct p.pledge_donor_id as id_number
from ad.pledge_rev p
where p.date_added >= last_refreshed
or p.date_modified >= last_refreshed
union
select distinct a.id_number
from ad.affiliation a
where a.date_added >= last_refreshed
or a.date_modified >= last_refreshed
),
This works fine. My procedure still only takes 5 minutes to run.
However, when I make the same identical change to the Delete statement, suddenly the procedure takes about 65 minutes to run. And I can tell from logging that the execution of the delete statement accounts for over 60 of those minutes.
Why would the variable work fine in the cursor, but cause a major slowdown in a delete statement in the body of the procedure?
Any help is greatly appreciated!

Related

SQL with while loop to DAX conversion

Trying to convert the SQL with while loop code into DAX. Trying to build this query without using temp tables as access is an issue on the database and only have views to work with. I believe best option for me is to code it in DAX. Could someone help with it.
DECLARE #sd DATETIME
DECLARE #ed DATETIME
SELECT #sd = CONVERT(DATETIME, '2021-01-31')
SELECT #ed = GETDATE()
DECLARE #date DATETIME = EOMONTH(#sd)
WHILE ( (#date) <= #ed )
BEGIN
SELECT MONTH(#date) as Month, YEAR(#date) as Year, DAY(#date) as Day, A.*
FROM [people] A
WHERE A.effective_date = (SELECT MAX(B.effective_date)
FROM [people] B
WHERE B.employee_id = A.employee_id
AND B.record_id = A.record_id
AND B.effective_date <= #date)
AND A.effective_sequence = (SELECT MAX(C.effective_sequence)
FROM [people] C
WHERE C.employee_id = A.employee_id
AND C.record_id = A.record_id
AND C.effective_date = A.effective_date)
ORDER BY A.employee_id;
SET #date = EOMONTH(DATEADD(MONTH,1,#date))
END
While you could do this as a view, you would either have to hard-code the start and end dates, or filter them afterwards (which is likely to be inefficient). Instead you can do this as an inline Table Valued Function.
We can use a virtual tally-table (generated with a couple cross-joins) to generate a row for each month
We can use row-numbering instead of the two subqueries
CREATE FUNCTION dbo.GetData (#sd DATETIME, #ed DATETIME)
RETURNS TABLE AS RETURN
WITH L0 AS (
SELECT *
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) v(n)
),
L1 AS (
SELECT 1 n FROM L0 a CROSS JOIN L0 b
)
SELECT
MONTH(m.Month) as Month,
YEAR(m.Month) as Year,
DAY(m.Month) as Day,
p.* -- specify columns
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY m.Month, p.employee_id, p.record_id ORDER BY p.effective_date, p.effective_sequence) AS rn
FROM [people] p
CROSS JOIN (
SELECT TOP (DATEDIFF(month, #sd, #ed) + 1)
DATEADD(month, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, EOMONTH(#sd)) AS Month
FROM L1
) m
WHERE p.effective_date <= m.Month
) p
WHERE p.rn = 1
;
Then in PowerBI you can just do for example
SELECT *
FROM dbo.GetData ('2021-01-31', GETDATE()) d
ORDER BY
d.employee_id
Note that you cannot put the ORDER BY within the function, it doesn't work.

Find increase in history records in specific range

I want to find records in date range 1/1/19-1/7/19 which increase amount
using table HISTORY:
DATE AMOUNT ID
(Date, number, varchar2(30))
I find IDs inside range correctly
assuming increase/decrease can happens only when having two records with same Id
with suspect as
(select id
from history
where t.createddate < to_date('2019-07-01', 'yyyy-mm-dd')
group by id
having count(1) > 1),
ids as
(select id
from history
join suspect
on history.id = suspect.id
where history.date > to_date('2019-01-01', 'yyyy-mm-dd')
and history.date < to_date('2019-07-01', 'yyyy-mm-dd'))
select count(distinct id)
from history a, history b
where a.id = b.id
and a.date < b.date
and a.amount < b.amount
The problem to find increase I need to find previous record which can be before time range
I can find last previous time before time range, but I failed to use it:
ids_prevtime as (
select history.*, max(t.date) over (partition by t.id) max_date
from history
join ids on history.userid = ids.id
where history.date < to_date('2019-01-01','yyyy-mm-dd' )
), ids_prev as (
select * from ids_prevtime where createdate=max_date
)
I see that you found solution, but maybe you could do it simpler, using lag():
select count(distinct id)
from (select id, date_, amount,
lag(amount) over (partition by id order by date_) prev_amt
from history)
where date_ between date '2019-01-01' and date '2019-07-01'
and amount > prev_amt;
dbfiddle
Add union of last history records before range with records inside range
ids_prev as
(select ID, DATE, AMOUNT
from id_before_rangetime
where createddate = max_date),
ids_in_range as
(select history.*
from history
join ids
on history.ID = ids.ID
where history.date > to_date('2019-01-01', 'yyyy-mm-dd')
and history.date < to_date('2019-07-01', 'yyyy-mm-dd')),
all_relevant as
(select * from ids_in_range union all select * from ids_prev)
and then count increases:
select count(distinct id)
from all_relevant a, all_relevant b
where a.id = b.id
and a.date < b.date
and a.amount < b.amount

Slow running query, Postgresql

I have a very slow query (30+ minutes or more) that I think can be sped up with more efficient coding. Below is the code and the query plan that results. So I am looking for answers to speed up with query that is performing several joins on large tables.
drop table if exists totalshad;
create temporary table totalshad as
select pricedate, hour, sum(cast(price as numeric)) as totalprice from
pjm.rtcons
where
rtcons.pricedate >= '2017-12-01'
-- and
-- rtcons.pricedate <= '2018-01-23'
group by pricedate, hour
order by pricedate, hour;
-----------------------------
drop table if exists percshad;
create temporary table percshad as
select totalshad.pricedate, totalshad.hour, facility, round(sum(cast(price
as numeric)),2) as cons_shad, round(sum(cast(totalprice as numeric)),2) as
total_shad, round(cast(price/totalprice as numeric),4) as per_shad from
totalshad
join pjm.rtcons on
rtcons.pricedate = totalshad.pricedate
and
rtcons.hour = totalshad.hour
and
facility = 'ETOWANDA-NMESHOPP ETL 1057 A 115 KV'
where totalprice <> 0 and totalshad.pricedate > '2017-12-01'
group by totalshad.pricedate, totalshad.hour, facility,
(price/totalprice)
order by per_shad desc
limit 5;
EXPLAIN select facility, percshad.pricedate, percshad.hour, per_shad,
minmcc.rtmcc, minnode.nodename, maxmcc.rtmcc, maxnode.nodename from percshad
join pjm.prices minmcc on
minmcc.pricedate = percshad.pricedate
and
minmcc.hour = percshad.hour
and
minmcc.rtmcc = (select min(rtmcc) from pjm.prices where pricedate =
percshad.pricedate and hour = percshad.hour)
join pjm.nodes minnode on
minnode.node_id = minmcc.node_id
join pjm.prices maxmcc on
maxmcc.pricedate = percshad.pricedate
and
maxmcc.hour = percshad.hour
and
maxmcc.rtmcc = (select max(rtmcc) from pjm.prices where pricedate =
percshad.pricedate and hour = percshad.hour)
join pjm.nodes maxnode on
maxnode.node_id = maxmcc.node_id
order by per_shad desc
limit 5
And here is the EXPLAIN output:
UPDATE: I have now simplified my code down to the following. But as can be seen from the EXPLAIN, it stills takes forever to find the node_id in the last select statement
drop table if exists totalshad;
create temporary table totalshad as
select pricedate, hour, sum(cast(price as numeric)) as totalprice from
pjm.rtcons
where
rtcons.pricedate >= '2017-12-01'
-- and
-- rtcons.pricedate <= '2018-01-23'
group by pricedate, hour
order by pricedate, hour;
-----------------------------
drop table if exists percshad;
create temporary table percshad as
select totalshad.pricedate, totalshad.hour, facility, round(sum(cast(price
as numeric)),2) as cons_shad, round(sum(cast(totalprice as numeric)),2) as
total_shad,
round(cast(price/totalprice as numeric),4) as per_shad from totalshad
join pjm.rtcons on
rtcons.pricedate = totalshad.pricedate
and
rtcons.hour = totalshad.hour
and
facility = 'ETOWANDA-NMESHOPP ETL 1057 A 115 KV'
where totalprice <> 0 and totalshad.pricedate > '2017-12-01'
group by totalshad.pricedate, totalshad.hour, facility, (price/totalprice)
order by per_shad desc
limit 5;
drop table if exists mincong;
create temporary table mincong as
select pricedate, hour, min(rtmcc) as rtmcc
from pjm.prices JOIN percshad USING (pricedate, hour)
group by pricedate, hour;
EXPLAIN select distinct on (pricedate, hour) prices.node_id from mincong
JOIN pjm.prices USING (pricedate, hour, rtmcc)
group by pricedate, hour, node_id
The problem are the subselects in the join condition; they have to be executed for every row joined.
If you cannot get rid of them, try to create an index that will support the subselects as good as possible:
CREATE INDEX ON pjm.prices(pricedate, hour, rtmcc);

alternatives to "Having"

I have a SELECT statement that counts the number of instances and then saves in a variable. It has a HAVING clause that does a SUM and a COUNT. However since you have to have a GROUP BY in order to use having, the select statement returns 4 lines that are 1 instead of the total being 4. This then doesn't save the count into the variable as 4 but as 1 which obviously is not what I need so I am looking for an alternative work around.
select count(distinct p1.community)
from
"Database".prospect p1
where
p1.visit_date >= '2013-07-01'
and p1.visit_date <= '2013-09-30'
and p1.division = '61'
group By
p1.community
having
sum(p1.status_1) / count(p1.control_code) >= .16;
This is a reasonable alternative:
select count(*)
from (
select p1.community , sum(p1.status_1) / count(p1.control_code) SomeColumn
from
"Database".prospect p1
where
p1.visit_date >= '2013-07-01'
and p1.visit_date <= '2013-09-30'
and p1.division = '61'
Group By
p1.community
) A
where A.SomeColumn >= .16;

Running sum with aggregate function

I am retrieving the results of the mlog table and calculate the subtotal of the qtyn with the help of following code 1. I am stuck with how to join my second code criteria with the first.
Thanks for any help
1.
SELECT autn, date, itcode, qtyn, out,
date, phstock,
qtyn + COALESCE(
(SELECT SUM(qtyn) FROM dbo.mlog b
WHERE b.autn < a.autn
AND itcode = '40'), 0) AS balance
FROM dbo.mlog a
WHERE (itcode = '40')
ORDER BY autn
2.
date >=(SELECT MAX([date]) FROM mlog)
To append a condition to the code, use AND or OR. EG:
SELECT a.autn, a.date, a.itcode, a.qtyn, a.out,
a.date, a.phstock,
a.qtyn + COALESCE(
(SELECT SUM(b.qtyn) FROM dbo.mlog b
WHERE b.autn < a.autn
AND b.itcode = '40'), 0) AS balance
FROM dbo.mlog a
WHERE (a.itcode = '40' AND a.date >= (SELECT MAX([c.date]) FROM mlog c) )
ORDER BY a.autn
Not tested, but should do what you want
I have heard that SQL Server is rather inefficient with coalesce(), because it runs the first part twice. Here is an alternative way of writing this:
with ml as (
SELECT ml.autn, ml.date, ml.itcode, ml.qtyn, ml.out, ml.date, ml.phstock
FROM dbo.mlog ml
WHERE ml.itcode = '40' AND ml.date >= (SELECT MAX(ml1.date]) FROM mlog ml1)
)
select ml.*,
(select sum(m1l.qtyn) from ml ml1 where ml1.autn <= ml.autn) as balance
from ml
ORDER BY ml.autn
I also wonder if the where clause would be more efficient as:
WHERE ml.itcode = '40' AND ml.date = (SELECT top 1 ml1.date FROM mlog ml1 order by ml1.date desc)