Oracle subquery's SELECT MAX() on large dataset is very slow

Oracle subquery's SELECT MAX() on large dataset is very slow - sql

The following SQL :
SELECT *
FROM Transaction_Auth_Series t
WHERE t.Auth_ID =
(
SELECT MAX(p.Session_ID)
FROM Clone_Db_Derective p
WHERE p.Date = trunc(sysdate)
AND p.Regularity = 'THEME'
);
is very slow when the referred tables contain about 300 million rows. But, it's just a matter of few seconds when the SQL is written in two cursors, i.e.
CURSOR GetMaxValue IS
SELECT MAX(p.Session_ID)
FROM Clone_Db_Derective p
WHERE p.Date = trunc(sysdate)
AND p.Regularity = 'THEME'
CURSOR GetAllItems(temp VARCHAR2) IS
SELECT *
FROM Transaction_Auth_Series t
WHERE t.Auth_ID = temp;
and
........
FOR item in GETMAX LOOP
FOR itemx in GETITEMS(item.aaa) LOOP.......
Joins won't work as the tables are not related. How can we optimize the above main SQL please?

For this query:
SELECT t.*
FROM Transaction_Auth_Series t
WHERE t.Auth_ID = (SELECT MAX(p.Session_ID)
FROM Clone_Db_Derective p
WHERE p.Date = trunc(sysdate) AND p.Regularity = 'THEME'
);
I would recommend indexes on Clone_Db_Derective(Regularity, Date, Session_ID) and Transaction_Auth_Series(Auth_Id).
The optimization for this query (assuming the tables are not views) seems pretty simple. I am surprised that the cursor version is so much faster.

WITH max_session
AS (SELECT MAX (p.Session_ID) id
FROM Clone_Db_Derective p
WHERE p.Date = TRUNC (SYSDATE) AND p.Regularity = 'THEME')
SELECT *
FROM Transaction_Auth_Series t
WHERE t.Auth_ID = (SELECT id FROM max_session)
A WITH clause is most valuable when the result of the WITH query is required more than one time in the body of the main query such as where one averaged value needs to be compared against two or three times. The point is to minimize the number of accesses to a table joined multiple times into a single query.

Related

ORACLE SQL - Compare dates without join

I have a very large table of data 1+ billion rows. If I try to join that table to itself to do a comparison, the cost on the estimated plan is unrunnable (cost: 226831405289150). Is there a way I can achieve the same results as the query below without a join, perhaps an over partition?
What I need to do is make sure another event did not happen within 24 hours before or after the one with the wildcare was received.
Thanks so much for your help!
select e2.SYSTEM_NO,
min(e2.DT) as dt
from SYSTEM_EVENT e2
inner join table1.event el2
on el2.event_id = e2.event_id
left join ( Select se.DT
from SYSTEM_EVENT se
where
--fails
( se.event_id in ('101','102','103','104')
--restores
or se.event_id in ('106','107','108','109')
)
) e3
on e3.dt-e2.dt between .0001 and 1
or e3.dt-e2.dt between -1 and .0001
where el2.descr like '%WILDCARE%'
and e3.dt is null
and e2.REC_STS_CD = 'A'
group by e2.SYSTEM_NO

Not having any test data it is difficult to determine what you are trying to achieve but it appears you could try using an analytic function with a range window:
SELECT system_no,
MIN( dt ) AS dt
FROM (
SELECT system_no,
dt,
COUNT(
CASE
WHEN ( se.event_id in ('101','102','103','104') --fails
OR se.event_id in ('106','107','108','109') ) --restores
THEN 1
END
) OVER (
ORDER BY dt
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS num
FROM system_event
) se
WHERE num = 0
AND REC_STS_CD = 'A'
AND EXISTS(
SELECT 1
FROM table1.event te
WHERE te.descr like '%WILDCARE%'
AND te.event_id = se.event_id
)
GROUP BY system_no

This is not direct answer for your question but it is a bit too long for comment.
How old data may be inserted? 48h window means you need to check only subset of data not whole 1bilion row table if data is inserted incrementally. So if it is please reduce data in comparison by some with clause or temporary table.
If you still need to compare along whole table I would go for partitioning by event_id or other attribute if there is better partition. And compare each group separately.
where el2.descr like '%WILDCARE%' is performance killer for such huge table.

Record returned from function has columns concatenated

I have a table which stores account changes over time. I need to join that up with two other tables to create some records for a particular day, if those records don't already exist.
To make things easier (I hope), I've encapsulated the query that returns the correct historical data into a function that takes in an account id, and the day.
If I execute "Select * account_servicetier_for_day(20424, '2014-08-12')", I get the expected result (all the data returned from the function in separate columns). If I use the function within another query, I get all the columns joined into one:
("2014-08-12 14:20:37",hollenbeck,691,12129,20424,69.95,"2Mb/1Mb 20GB Limit",2048,1024,20.000)
I'm using "PostgreSQL 9.2.4 on x86_64-slackware-linux-gnu, compiled by gcc (GCC) 4.7.1, 64-bit".
Query:
Select
'2014-08-12' As day, 0 As inbytes, 0 As outbytes, acct.username, acct.accountid, acct.userid,
account_servicetier_for_day(acct.accountid, '2014-08-12')
From account_tab acct
Where acct.isdsl = 1
And acct.dslservicetypeid Is Not Null
And acct.accountid Not In (Select accountid From dailyaccounting_tab Where Day = '2014-08-12')
Order By acct.username
Function:
CREATE OR REPLACE FUNCTION account_servicetier_for_day(_accountid integer, _day timestamp without time zone) RETURNS setof account_dsl_history_info AS
$BODY$
DECLARE _accountingrow record;
BEGIN
Return Query
Select * From account_dsl_history_info
Where accountid = _accountid And timestamp <= _day + interval '1 day - 1 millisecond'
Order By timestamp Desc
Limit 1;
END;
$BODY$ LANGUAGE plpgsql;

Generally, to decompose rows returned from a function and get individual columns:
SELECT * FROM account_servicetier_for_day(20424, '2014-08-12');
As for the query:
Postgres 9.3 or newer
Cleaner with JOIN LATERAL:
SELECT '2014-08-12' AS day, 0 AS inbytes, 0 AS outbytes
, a.username, a.accountid, a.userid
, f.* -- but avoid duplicate column names!
FROM account_tab a
, account_servicetier_for_day(a.accountid, '2014-08-12') f -- <-- HERE
WHERE a.isdsl = 1
AND a.dslservicetypeid IS NOT NULL
AND NOT EXISTS (
SELECT FROM dailyaccounting_tab
WHERE day = '2014-08-12'
AND accountid = a.accountid
)
ORDER BY a.username;
The LATERAL keyword is implicit here, functions can always refer earlier FROM items. The manual:
LATERAL can also precede a function-call FROM item, but in this
case it is a noise word, because the function expression can refer to
earlier FROM items in any case.
Related:
Insert multiple rows in one table based on number in another table
Short notation with a comma in the FROM list is (mostly) equivalent to a CROSS JOIN LATERAL (same as [INNER] JOIN LATERAL ... ON TRUE) and thus removes rows from the result where the function call returns no row. To retain such rows, use LEFT JOIN LATERAL ... ON TRUE:
...
FROM account_tab a
LEFT JOIN LATERAL account_servicetier_for_day(a.accountid, '2014-08-12') f ON TRUE
...
Also, don't use NOT IN (subquery) when you can avoid it. It's the slowest and most tricky of several ways to do that:
Select rows which are not present in other table
I suggest NOT EXISTS instead.
Postgres 9.2 or older
You can call a set-returning function in the SELECT list (which is a Postgres extension of standard SQL). For performance reasons, this is best done in a subquery. Decompose the (well-known!) row type in the outer query to avoid repeated evaluation of the function:
SELECT '2014-08-12' AS day, 0 AS inbytes, 0 AS outbytes
, a.username, a.accountid, a.userid
, (a.rec).* -- but be wary of duplicate column names!
FROM (
SELECT *, account_servicetier_for_day(a.accountid, '2014-08-12') AS rec
FROM account_tab a
WHERE a.isdsl = 1
AND a.dslservicetypeid Is Not Null
AND NOT EXISTS (
SELECT FROM dailyaccounting_tab
WHERE day = '2014-08-12'
AND accountid = a.accountid
)
) a
ORDER BY a.username;
Related answer by Craig Ringer with an explanation, why it's better not to decompose on the same query level:
How to avoid multiple function evals with the (func()).* syntax in an SQL query?
Postgres 10 removed some oddities in the behavior of set-returning functions in the SELECT:
What is the expected behaviour for multiple set-returning functions in SELECT clause?

Use the function in the from clause
Select
'2014-08-12' As day,
0 As inbytes,
0 As outbytes,
acct.username,
acct.accountid,
acct.userid,
asfd.*
From
account_tab acct
cross join lateral
account_servicetier_for_day(acct.accountid, '2014-08-12') asfd
Where acct.isdsl = 1
And acct.dslservicetypeid Is Not Null
And acct.accountid Not In (Select accountid From dailyaccounting_tab Where Day = '2014-08-12')
Order By acct.username

How can I query a T-SQL table in limited rows at a time

I have a table with (eg) 1500 rows. I have determined that (due to the resources of the server) my query can quickly process up to 500 rows of my table in my desired query. Any more than 500 rows at once and it suddenly becomes very slow.
How can I structure a query to process the contents of a table in row groups of 500, through to the end of the table?
[EDIT] The query which takes a long time is this:
select p.childid, max(c.childtime) ChildTime
from child c
inner join #parents p on p.parentid = c.parentid
and c.ChildTypeID = 1
AND c.childtime < getdate()
group by p.parentid
The problem is that the point table has millions of rows and (for reasons I can't go into here) can't be reduced.
The main problem is: reducing the number of rows from the child table to make the query performant. Unfortunately, this query is being performed to populate a temporary table so that a subsequent query can execute quickly.

One possibility is to use the windowing functions. Only trick is that they cannot be used in the WHERE clause, so you will have to use subqueries.
Select a.*, b.*
From
(
Select *, rownum = ROW_NUMBER() over (order by fieldx)
From TableA
) a
Inner Join TableB b on a.fieldx=b.fieldx
Where a.rownum between #startnum and #endnum

If this is just for processing you might be able to do something like this:
DECLARE #RowsPerPage INT = 500, #PageNumber INT = 1
DECLARE #TotalRows INT
SET #TotalRows = (SELECT count(1) from test)
WHILE (#RowsPerPage * #PageNumber < #TotalRows)
BEGIN
SELECT
Id
FROM
(
SELECT
Id,
ROW_NUMBER() OVER (ORDER BY Id) as RowNum
FROM Test
) AS sub
WHERE sub.RowNum BETWEEN ((#PageNumber-1)*#RowsPerPage)+1
AND #RowsPerPage*(#PageNumber)
SET #PageNumber = #PageNumber + 1
END
Calculates the total rows and then loops and pages through the results. It's not too helpful if you need your results together, though, because this will run X number of separate queries. You might be able to put this in a stored procedure and union the results or something crazy like that to get the results in one "query".
I still think a better option would be to fix the slowness in your original query. Is it possible to load the join results into a CTE/Temp table to only perform calculations a single time? If you could give an example of your query that would help a lot...

How to optimize SQL of select x divide by subquery with sum(y) with different table?

I have this query in T-SQL 2008:
SELECT a.Amount / (
SELECT SUM(b.Amount) FROM Revenue b
WHERE YEAR(b.RevenueDate) = YEAR(a.ExpenseDate)
AND MONTH(b.RevenueDate) = MONTH(a.ExpenseDate)
AND b.HotelKey = a.HotelKey
)
FROM Expense a
The problem is it takes too long to finish the query. I think it's caused by the subquery "SELECT SUM(b.Amount) FROM Revenue b..." which is executed for each row in table Expense.
How to optimize that kind of query? Is there any better alternative for the query?
EDIT: I'm sorry I forget the "AND b.HotelKey = a.HotelKey" clause in the subquery. The above original query has been updated.
#damien:
Here is your query added with HotelKey join:
SELECT
a.Amount / b.Amount
FROM
Expense a
inner join
(SELECT
HotelKey,
DATEADD(month,DATEDIFF(month,0,RevenueDate),0) as MonthStart,
DATEADD(month,1+DATEDIFF(month,0,RevenueDate),0) as MonthEnd,
SUM(Amount) as Amount
FROM
Revenue
GROUP BY
HotelKey,
DATEADD(month,DATEDIFF(month,0,RevenueDate),0),
DATEADD(month,1+DATEDIFF(month,0,RevenueDate),0)
) b
ON
a.ExpenseDate >= b.MonthStart and
a.ExpenseDate < b.MonthEnd
and a.HotelKey = b.HotelKey

Try to change the where clause in your inner query to this:
where b.RevenueDate >= dateadd(month, datediff(month, 0, a.ExpenseDate), 0) and
b.RevenueDate < dateadd(month, 1+datediff(month, 0, a.ExpenseDate), 0)
It will give the query a chance to use an index on Revenue.RevenueDate if you have one.

If you're using a lot of the rows in Revenue to satisfy this query, you might do better to do a single subquery that computes all of the totals. (Also, using Mikael's suggestion for allowing some indexing to occur):
SELECT
a.Amount / b.Amount
FROM
Expense a
inner join
(SELECT
DATEADD(month,DATEDIFF(month,0,RevenueDate),0) as MonthStart,
DATEADD(month,1+DATEDIFF(month,0,RevenueDate),0) as MonthEnd
SUM(Amount) as Amount
FROM
Revenue
GROUP BY
DATEADD(month,DATEDIFF(month,0,RevenueDate),0),
DATEADD(month,1+DATEDIFF(month,0,RevenueDate),0)
) b
ON
a.ExpenseDate >= b.MonthStart and
a.ExpenseDate < b.MonthEnd

You don't specify how big the tables are. But you can get the query to be faster by creating a computed column (and indexing it) from year-month combination in table Revenue and in table Expense as well (if that table is not very small). So these computed columns (and the indexes) would be used in the query for joining the two tables.
See: Computed Columns
and: Creating Indexes on Computed Columns

You could try calculating the two sums first ant then joining the two together afterwards.
SELECT a.ExpenseAmount / b.RevenueAmount
FROM
(
SELECT SUM(Expense.Amount) As ExpenseAmount,
YEAR(Expense.ExpenseDate) AS ExpenseYear,
MONTH(Expense.ExpenseDate) AS ExpenseMonth
GROUP BY
YEAR(Expense.RevenueDate),
MONTH(Expense.RevenueDate)
) AS a INNER JOIN
(
SELECT SUM(Revenue.Amount) AS RevenueAmount,
YEAR(Revenue.RevenueDate) AS RevenueYear,
MONTH(Revenue.RevenueDate) AS RevenueMonth
FROM Revenue
GROUP BY YEAR(Revenue.RevenueDate), MONTH(Revenue.RevenueDate)
) as b ON a.ExpenseYear = b.RevenueYear AND a.ExpenseMonth = b.ExpenseMonth

Weighted average in T-SQL (like Excel's SUMPRODUCT)

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):
(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)
The first part reflects the same functionality as Excel's SUMPRODUCT() function.
My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.
EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:
select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Weighty'
) baseSeries inner join (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Tons Milled'
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate

Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...
SELECT
SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
[tbl]
WHERE
[tbl].date >= '2009 Jan 01'
AND [tbl].date < '2010 Jan 01'
The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string
Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)
EDIT:
In your lastest example you highlight three fields:
RecordDate
KPI
Actual
When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.
Some questions I have are:
Are there any other fields?
Is there only ever ONE actual per date per KPI?
The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)
Regardless, a slight simplification of your query is certainly possible...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
CalcProductionRecords AS [baseSeries]
INNER JOIN
CalcProductionRecords AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
-- AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
[baseSeries].KPI = 'Tons Milled'
AND [weightSeries].KPI = 'Weighty'
The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.
If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
SELECT
RecordDate,
SUM(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Tons Milled'
GROUP BY
RecordDate
)
AS [baseSeries]
INNER JOIN
(
SELECT
RecordDate,
AVG(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Weighty'
GROUP BY
RecordDate
)
AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
This assumes the AVG of the weight is valid if there are multiple weights for the same day.
EDIT : Someone just voted for this so I thought I'd improve the final answer :)
SELECT
SUM(Actual * Weight) / SUM(Weight)
FROM
(
SELECT
RecordDate,
SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END) AS Actual,
AVG(CASE WHEN KPI = 'Weighty' THEN Actual ELSE NULL END) AS Weight
FROM
CalcProductionRecords
WHERE
KPI IN ('Tons Milled', 'Weighty')
GROUP BY
RecordDate
)
AS pivotAggregate
This avoids the JOIN and also only scans the table once.
It relies on the fact that NULL values are ignored when calculating the AVG().

SELECT SUM(A * B) / SUM(A)
FROM mytable

If I have understand the problem then try this
SET DATEFORMAT dmy
declare #tbl table(A int, B int,recorddate datetime,KPI varchar(50))
insert into #tbl
select 1,10 ,'21/01/2009', 'Weighty'union all
select 2,20,'10/01/2009', 'Tons Milled' union all
select 3,30 ,'03/02/2009', 'xyz'union all
select 4,40 ,'10/01/2009', 'Weighty'union all
select 5,50 ,'05/01/2009', 'Tons Milled'union all
select 6,60,'04/01/2009', 'abc' union all
select 7,70 ,'05/01/2009', 'Weighty'union all
select 8,80,'09/01/2009', 'xyz' union all
select 9,90 ,'05/01/2009', 'kws' union all
select 10,100,'05/01/2009', 'Tons Milled'
select SUM(t1.A*t2.A)/SUM(t2.A)Result from
(select RecordDate,A,B,KPI from #tbl)t1
inner join(select RecordDate,A,B,KPI from #tbl t)t2
on t1.RecordDate = t2.RecordDate
and t1.KPI = t2.KPI

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Oracle subquery's SELECT MAX() on large dataset is very slow - sql

Related

ORACLE SQL - Compare dates without join

Record returned from function has columns concatenated

How can I query a T-SQL table in limited rows at a time

How to optimize SQL of select x divide by subquery with sum(y) with different table?

Weighted average in T-SQL (like Excel's SUMPRODUCT)

Categories

Resources