Speed up SQL simple Query - sql

We have a table called PROTOKOLL, with the following definition:
PROTOKOLL TableDefinition
The table has 10 million pcs of records.
SELECT *
FROM (SELECT /*+ FIRST_ROWS */ a.*, ROWNUM rnum
FROM (SELECT t0.*, t1.*
FROM PROTOKOLL t0
, PROTOKOLL t1
WHERE (
(
(t0.BENUTZER_ID = 'A07BU0006')
AND (t0.TYP = 'E')
) AND
(
(t1.UUID = t0.ANDERES_PROTOKOLL_UUID)
AND
(t1.TYP = 'A')
)
)
ORDER BY t0.ZEITPUNKT DESC
) a
WHERE ROWNUM <= 4999) WHERE rnum > 0;
So practically we join the table with itself through ANDERES_PROTOKOLL_UUID field, we apply simple filterings. The results are sorted with creation time and the number of the result record set is limited to 5000.
The elapsed time of the query is about 10 Minutes! --- which is not acceptable ☹
I already have the execution plan and statistic information in place and trying to figure out how to speed up the query, pls. find them attached.
My first recognition, that the optimizer puts “"P"."ANDERES_PROTOKOLL_UUID" IS NOT NULL” condition additionally to the where clause, but I do not know why. Is it a problem?
Or where are the bottleneck of the query?
How can I avoid….Any suggestion is welcome.

Related

Is there a better way to retrieve a random row from an Oracle table?

Not so long ago I needed to fetch a random row from a table in an Oracle database. The most widespread solution that I've found was this:
SELECT * FROM
( SELECT * FROM tabela WHERE warunek
ORDER BY dbms_random.value )
WHERE rownum = 1​
However, this is very performance heavy for large tables, as it sorts the table in random order first, then grabs the first row.
Today, one of my collegues suggested a different way:
SELECT * FROM (
SELECT * FROM MAIN_PRODUCT
WHERE ROWNUM <= CAST((SELECT COUNT(*) FROM MAIN_PRODUCT)*dbms_random.value AS INTEGER)
ORDER BY ROWNUM DESC
) WHERE ROWNUM = 1;
It works way faster and seems to return random values, but does it really? Could someone give an insight into whether it is really random and behaves the way as expected? I'm really curious why I haven't found this approach anywhere else while looking, and if it is indeed random and way better performance wise, why isn't it more widespread?
This is the (possibly) the most simple query possible to get the results.
But the SELECT COUNT(*) FROM MAIN_PRODUCT will table scan i doubt you can get a query which does not do that.
P.s This query assumes not deleted records.
Query
SELECT *
FROM
MAIN_PRODUCT
WHERE
ROWNUM = FLOOR(
(dbms_random.value * (SELECT COUNT(*) FROM MAIN_PRODUCT)) + 1
)
FLOOR(
(dbms_random.value * (SELECT COUNT(*) FROM MAIN_PRODUCT)) + 1
)
Will generate a number between between 1 and the max count of the table see demo how that works when you refresh it.
Oracle12c+ Query
SELECT *
FROM
MAIN_PRODUCT
WHERE
ROWNUM <= FLOOR(
(dbms_random.value * (SELECT COUNT(*) FROM MAIN_PRODUCT)) + 1
)
ORDER BY
ROWNUM DESC
FETCH FIRST ROW ONLY
The second code you have
SELECT * FROM (
SELECT * FROM MAIN_PRODUCT
WHERE ROWNUM <= CAST((SELECT COUNT(*) FROM MAIN_PRODUCT)*dbms_random.value AS INTEGER)
ORDER BY ROWNUM DESC
) WHERE ROWNUM = 1;
is excellent, except that it will get subsequent elements. dbms_random.value is returning a real number between 0 and 1. Multiplying this with the number of rows will provide you a really random number and the bottleneck here is counting the number of rows rather then generating a random value for each row.
Proof
Consider the
0 <= x < 1
number. If we multiply it with n, we get
0 <= n * x < n
which is exactly what you need if you want to load a single element. The reason this is not widespread is that in many cases the performance issues are not felt due to only a few thousands of records.
EDIT
If you would need k number of records, not just the first one, then it would be slightly difficult, however, still solvable. The algorithm would be something like this (I do not have Oracle installed to test it, so I only describe the algorithm):
randomize(n, k)
randomized <- empty_set
while (k > 0) do
newValue <- random(n)
n <- n - 1
k <- k - 1
//find out how many elements are lower than newValue
//increase newValue with that amount
//find out if newValue became larger than some values which were larger than new value
//increase newValue with that amount
//repeat until there is no need to increase newValue
while end
randomize end
If you randomize k elements from n, then you will be able to use those values in your filter.
The key to improving performance is to lessen the load of the ORDER BY.
If you know about how many rows match the conditions, then you can filter before the sort. For instance, the following takes about 1% of the rows:
SELECT *
FROM (SELECT *
FROM tabela
WHERE warunek AND dbms_random.value < 0.01
ORDER BY dbms_random.value
)
WHERE rownum = 1​ ;
A variation is to calculate the number of matching values. Then randomly select a smaller sample. The following gets about 100 matching rows and then sorts them for the random selection:
SELECT a.*
FROM (SELECT *
FROM (SELECT a.*, COUNT(*) OVER () as cnt
FROM tabela a
WHERE warunek
) a
WHERE dbms_random.value < 100 / cnt
ORDER BY dbms_random.value
) a
WHERE rownum = 1​ ;

ORACLE SQL - Compare dates without join

I have a very large table of data 1+ billion rows. If I try to join that table to itself to do a comparison, the cost on the estimated plan is unrunnable (cost: 226831405289150). Is there a way I can achieve the same results as the query below without a join, perhaps an over partition?
What I need to do is make sure another event did not happen within 24 hours before or after the one with the wildcare was received.
Thanks so much for your help!
select e2.SYSTEM_NO,
min(e2.DT) as dt
from SYSTEM_EVENT e2
inner join table1.event el2
on el2.event_id = e2.event_id
left join ( Select se.DT
from SYSTEM_EVENT se
where
--fails
( se.event_id in ('101','102','103','104')
--restores
or se.event_id in ('106','107','108','109')
)
) e3
on e3.dt-e2.dt between .0001 and 1
or e3.dt-e2.dt between -1 and .0001
where el2.descr like '%WILDCARE%'
and e3.dt is null
and e2.REC_STS_CD = 'A'
group by e2.SYSTEM_NO
Not having any test data it is difficult to determine what you are trying to achieve but it appears you could try using an analytic function with a range window:
SELECT system_no,
MIN( dt ) AS dt
FROM (
SELECT system_no,
dt,
COUNT(
CASE
WHEN ( se.event_id in ('101','102','103','104') --fails
OR se.event_id in ('106','107','108','109') ) --restores
THEN 1
END
) OVER (
ORDER BY dt
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS num
FROM system_event
) se
WHERE num = 0
AND REC_STS_CD = 'A'
AND EXISTS(
SELECT 1
FROM table1.event te
WHERE te.descr like '%WILDCARE%'
AND te.event_id = se.event_id
)
GROUP BY system_no
This is not direct answer for your question but it is a bit too long for comment.
How old data may be inserted? 48h window means you need to check only subset of data not whole 1bilion row table if data is inserted incrementally. So if it is please reduce data in comparison by some with clause or temporary table.
If you still need to compare along whole table I would go for partitioning by event_id or other attribute if there is better partition. And compare each group separately.
where el2.descr like '%WILDCARE%' is performance killer for such huge table.

SQL Azure query aggregate performance issue

I'm trying to improve our SQL Azure database performamce, trying to change the use of CURSOR while this is (as everybody told me) something to avoid.
Our table is about GPS information, rows with a id clustered index and secondary indexes on device, timestamp and geography index on location.
I'm trying to compute some statistic such minimum speed (doppler and computed), total distance, average speed, ... along period for a specific device.
I have NO choice on the stat and CAN'T change the table or output because of production.
I have a clear performance issue when running this inline tbl function on my SQL Azure DB.
ALTER FUNCTION [dbo].[fn_logMetrics_3]
(
#p_device smallint,
#p_from dateTime,
#p_to dateTime,
#p_moveThresold int = 1
)
RETURNS TABLE
AS
RETURN
(
WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY timestamp) AS RowNum,
Timestamp,
Location,
Alt,
Speed
FROM
LogEvents
WHERE
Device = #p_device
AND Timestamp >= #p_from
AND Timestamp <= #p_to),
CTE1 AS
(
SELECT
t1.Speed as Speed,
t1.Alt as Alt,
t2.Alt - t1.Alt as DeltaElevation,
t1.Timestamp as Time0,
t2.Timestamp as Time1,
DATEDIFF(second, t2.Timestamp, t1.Timestamp) as Duration,
t1.Location.STDistance(t2.Location) as Distance
FROM
CTE t1
INNER JOIN
CTE t2 ON t1.RowNum = t2.RowNum + 1),
CTE2 AS
(
SELECT
Speed, Alt,
DeltaElevation,
Time0, Time1,
Duration,
Distance,
CASE
WHEN Duration <> 0
THEN (Distance / Duration) * 3.6
ELSE NULL
END AS CSpeed,
CASE
WHEN DeltaElevation > 0
THEN DeltaElevation
ELSE NULL
END As PositiveAscent,
CASE
WHEN DeltaElevation < 0
THEN DeltaElevation
ELSE NULL
END As NegativeAscent,
CASE
WHEN Distance < #p_moveThresold
THEN Duration
ELSE NULL
END As StopTime,
CASE
WHEN Distance > #p_moveThresold
THEN Duration
ELSE NULL
END As MoveTime
FROM
CTE1 t1
)
SELECT
COUNT(*) as Count,
MIN(Speed) as HSpeedMin, MAX(Speed) as HSpeedMax,
AVG(Speed) as HSpeedAverage,
MIN(CSpeed) as CHSpeedMin, MAX(CSpeed) as CHSpeedMax,
AVG(CSpeed) as CHSpeedAverage,
SUM(Distance) as CumulativeDistance,
MAX(Alt) as AltMin, MIN(Alt) as AltMax,
SUM(PositiveAscent) as PositiveAscent,
SUM(NegativeAscent) as NegativeAscent,
SUM(StopTime) as StopTime,
SUM(MoveTime) as MoveTime
FROM
CTE2 t1
)
The broad idea is
CTE is selecting the correponding rows, following the parameters
CTE1 perform aggregation within two consecutive row, in order to get Duration and Distance
then CTE2 perform operation on these Distance and Duration
Finally the last select is doing aggregation such sum and average over each columns
Everything working pretty well, until the last SELECT call where the agregate function (which are only few sum and average) killed the performance.
This query selecting 1500 rows against table with 4M rows is taking 1500ms.
when replacing the last select with
SELECT ÇOUNT(*) as count FROM CTE2 t1
then it's take only few ms.. (down to 2ms according to SQL Studio statistics).
with
SELECT
COUNT(*) as Count,
SUM(MoveTime) as MoveTime
it's about 125ms
with
SELECT
COUNT(*) as Count,
SUM(StopTime) as StopTime,
SUM(MoveTime) as MoveTime
it's about 250ms
like each aggregate are running on consecutive loop operation over all the row, within the same thread and without beeing parallelized
For information, the CURSOR version (I wrote couple of year ago) of this function is running actually at least twice fast...
What is wrong with this aggregate? How to optimize it?
UPDATE :
The query plans for
SELECT COUNT(*) as Count
The query plans for the full Select with agregate
According the answer of Joe C, I introduce a #tmp table in the plans and perform the aggregate on it. The result is about twice as fast, which is an interesting fact.

Oracle HASH_JOIN_RIGHT_SEMI performance

Here is my query,
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
SHIPMENT_ITEMS is a very large table (10.1TB) , id_map is a very small table (12 rows and 3 columns). This query goes through HASH_JOIN_RIGHT_SEMI and takes a very long time.SHIPMENT_ITEMS is partitioned on ID column.
If I remove subquery with hard code values , it performs lot better
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN (1,2,3 )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
I cannot remove the subquery as it leads to hard coding.
Given that id_map is a very small table , I expect both queries to perform very similar. Why is the first one taking much longer.
I'm actually trying to understand why this performs so bad.
I expect dynamic partition pruning to happen here and I'm not able to come out with a reason on why its not happening
https://docs.oracle.com/cd/E11882_01/server.112/e25523/part_avail.htm#BABHDCJG
Try hint no_unnest.
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT /*+ NO_UNNEST */ ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
CBO will not try to join subquery and use it like filter
Instead of using 'in' operator, use exists and check the query performance
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE Exists ( SELECT 1 FROM id_map map WHERE map.code = 'A' and map.ID = so.ID)
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')

How can I query a T-SQL table in limited rows at a time

I have a table with (eg) 1500 rows. I have determined that (due to the resources of the server) my query can quickly process up to 500 rows of my table in my desired query. Any more than 500 rows at once and it suddenly becomes very slow.
How can I structure a query to process the contents of a table in row groups of 500, through to the end of the table?
[EDIT] The query which takes a long time is this:
select p.childid, max(c.childtime) ChildTime
from child c
inner join #parents p on p.parentid = c.parentid
and c.ChildTypeID = 1
AND c.childtime < getdate()
group by p.parentid
The problem is that the point table has millions of rows and (for reasons I can't go into here) can't be reduced.
The main problem is: reducing the number of rows from the child table to make the query performant. Unfortunately, this query is being performed to populate a temporary table so that a subsequent query can execute quickly.
One possibility is to use the windowing functions. Only trick is that they cannot be used in the WHERE clause, so you will have to use subqueries.
Select a.*, b.*
From
(
Select *, rownum = ROW_NUMBER() over (order by fieldx)
From TableA
) a
Inner Join TableB b on a.fieldx=b.fieldx
Where a.rownum between #startnum and #endnum
If this is just for processing you might be able to do something like this:
DECLARE #RowsPerPage INT = 500, #PageNumber INT = 1
DECLARE #TotalRows INT
SET #TotalRows = (SELECT count(1) from test)
WHILE (#RowsPerPage * #PageNumber < #TotalRows)
BEGIN
SELECT
Id
FROM
(
SELECT
Id,
ROW_NUMBER() OVER (ORDER BY Id) as RowNum
FROM Test
) AS sub
WHERE sub.RowNum BETWEEN ((#PageNumber-1)*#RowsPerPage)+1
AND #RowsPerPage*(#PageNumber)
SET #PageNumber = #PageNumber + 1
END
Calculates the total rows and then loops and pages through the results. It's not too helpful if you need your results together, though, because this will run X number of separate queries. You might be able to put this in a stored procedure and union the results or something crazy like that to get the results in one "query".
I still think a better option would be to fix the slowness in your original query. Is it possible to load the join results into a CTE/Temp table to only perform calculations a single time? If you could give an example of your query that would help a lot...