Created a SQL query to summarize some data. It is slow, so I thought I'd ask for some help.
Table is a log table that has :
loc, tag, entrytime, exittime, visits, entrywt, exitwt
My test log has 700,000 records in it. The entrytime and exittime are epoch values.
I know my query is inefficient as it rips through the table 4 times.
select
loc, edate, tag,
(select COUNT(*) from mylog as ml
where mvlog.loc = ml.loc
and mvlog.edate = CONVERT(date, DATEADD(ss, ml.entrytime, '19700101'))
and mvlog.tag = ml.tag) as visits,
(select SUM(entrywt - exitwt) from mylog as ml2
where mvlog.loc = ml2.loc
and mvlog.edate = CONVERT(date, DATEADD(ss, ml2.entrytime, '19700101'))
and mvlog.tag = ml2.tag) as consumed,
(select SUM(exittime - entrytime) from mylog as ml3
where mvlog.loc = ml3.loc
and mvlog.edate = CONVERT(date, DATEADD(ss, ml3.entrytime, '19700101'))
and mvlog.tag = ml3.tag) as occupancy
from
eventlogV as mvlog with (INDEX(pt_index))
Index pt_index is made up of columns tag and loc.
When I run this query, it completes in roughly 30 seconds. Since my query is inefficient, I am sure it can be better.
Any ideas appreciated.
Seems like you can just LEFT JOIN mylog to eventlogV once and get the same results.
SELECT mvlog.loc,
mvlog.edate,
mvlog.tag,
COUNT(ml.loc) AS visits,
SUM(entrywt - exitwt) AS consumed,
SUM(exittime - entrytime) AS occupancy
FROM eventlogV AS mvlog
LEFT OUTER JOIN mylog ml ON mvlog.loc = ml.loc
AND mvlog.edate = CONVERT(DATE,DATEADD(ss,ml.entrytime,'19700101'))
AND mvlog.tag = ml.tag
GROUP BY mvlog.loc,
mvlog.edate,
mvlog.tag
Related
I have a very slow query (30+ minutes or more) that I think can be sped up with more efficient coding. Below is the code and the query plan that results. So I am looking for answers to speed up with query that is performing several joins on large tables.
drop table if exists totalshad;
create temporary table totalshad as
select pricedate, hour, sum(cast(price as numeric)) as totalprice from
pjm.rtcons
where
rtcons.pricedate >= '2017-12-01'
-- and
-- rtcons.pricedate <= '2018-01-23'
group by pricedate, hour
order by pricedate, hour;
-----------------------------
drop table if exists percshad;
create temporary table percshad as
select totalshad.pricedate, totalshad.hour, facility, round(sum(cast(price
as numeric)),2) as cons_shad, round(sum(cast(totalprice as numeric)),2) as
total_shad, round(cast(price/totalprice as numeric),4) as per_shad from
totalshad
join pjm.rtcons on
rtcons.pricedate = totalshad.pricedate
and
rtcons.hour = totalshad.hour
and
facility = 'ETOWANDA-NMESHOPP ETL 1057 A 115 KV'
where totalprice <> 0 and totalshad.pricedate > '2017-12-01'
group by totalshad.pricedate, totalshad.hour, facility,
(price/totalprice)
order by per_shad desc
limit 5;
EXPLAIN select facility, percshad.pricedate, percshad.hour, per_shad,
minmcc.rtmcc, minnode.nodename, maxmcc.rtmcc, maxnode.nodename from percshad
join pjm.prices minmcc on
minmcc.pricedate = percshad.pricedate
and
minmcc.hour = percshad.hour
and
minmcc.rtmcc = (select min(rtmcc) from pjm.prices where pricedate =
percshad.pricedate and hour = percshad.hour)
join pjm.nodes minnode on
minnode.node_id = minmcc.node_id
join pjm.prices maxmcc on
maxmcc.pricedate = percshad.pricedate
and
maxmcc.hour = percshad.hour
and
maxmcc.rtmcc = (select max(rtmcc) from pjm.prices where pricedate =
percshad.pricedate and hour = percshad.hour)
join pjm.nodes maxnode on
maxnode.node_id = maxmcc.node_id
order by per_shad desc
limit 5
And here is the EXPLAIN output:
UPDATE: I have now simplified my code down to the following. But as can be seen from the EXPLAIN, it stills takes forever to find the node_id in the last select statement
drop table if exists totalshad;
create temporary table totalshad as
select pricedate, hour, sum(cast(price as numeric)) as totalprice from
pjm.rtcons
where
rtcons.pricedate >= '2017-12-01'
-- and
-- rtcons.pricedate <= '2018-01-23'
group by pricedate, hour
order by pricedate, hour;
-----------------------------
drop table if exists percshad;
create temporary table percshad as
select totalshad.pricedate, totalshad.hour, facility, round(sum(cast(price
as numeric)),2) as cons_shad, round(sum(cast(totalprice as numeric)),2) as
total_shad,
round(cast(price/totalprice as numeric),4) as per_shad from totalshad
join pjm.rtcons on
rtcons.pricedate = totalshad.pricedate
and
rtcons.hour = totalshad.hour
and
facility = 'ETOWANDA-NMESHOPP ETL 1057 A 115 KV'
where totalprice <> 0 and totalshad.pricedate > '2017-12-01'
group by totalshad.pricedate, totalshad.hour, facility, (price/totalprice)
order by per_shad desc
limit 5;
drop table if exists mincong;
create temporary table mincong as
select pricedate, hour, min(rtmcc) as rtmcc
from pjm.prices JOIN percshad USING (pricedate, hour)
group by pricedate, hour;
EXPLAIN select distinct on (pricedate, hour) prices.node_id from mincong
JOIN pjm.prices USING (pricedate, hour, rtmcc)
group by pricedate, hour, node_id
The problem are the subselects in the join condition; they have to be executed for every row joined.
If you cannot get rid of them, try to create an index that will support the subselects as good as possible:
CREATE INDEX ON pjm.prices(pricedate, hour, rtmcc);
i have a select statement that contains hundred thousands if data, however the execution time is very slow which take longer than 15 minutes. Is the any way that i can improve the execution time for this select statement.
select a.levelP,
a.code,
a.descP,
(select nvl(SUM(amount),0) from ca_glopen where code = a.code and acc_mth = '2016' ) ocf,
(select nvl(SUM(amount),0) from ca_glmaintrx where code = a.code and to_char(doc_date,'yyyy') = '2016' and to_char(doc_date,'yyyymm') < '201601') bcf,
(select nvl(SUM(amount),0) from ca_glmaintrx where jum_amaun > 0 and code = a.code and to_char(doc_date,'yyyymm') = '201601' ) debit,
(select nvl(SUM(amount),0) from ca_glmaintrx where jum_amaun < 0 and code = a.code and to_char(doc_date,'yyyymm') = '201601' ) credit
from ca_chartAcc a
where a.code is not null
order by to_number(a.code), to_number(levelP)
please help me for the way to up speed my query and result.TQ
Your primary problem is that most of your subqueries use functions on your search criteria, including some awkward ones on your dates. It's much better to flip that around and explicitly qualify the expected range, by supplying actual dates (a one month range is usually a small percentage of total rows, so this is very likely to hit an index).
SELECT Chart.levelP, Chart.code, Chart.descP,
COALESCE(GL_SUM.ocf, 0),
COALESCE(Transactions.bcf, 0),
COALESCE(Transactions.debit, 0),
COALESCE(Transactions.credit, 0),
FROM ca_ChartAcc Chart
LEFT JOIN (SELECT code, SUM(amount) AS ocf
FROM ca_GLOpen
WHERE acc_mth = '2016') GL_Sum
ON GL_Sum.code = Chart.code
LEFT JOIN (SELECT code,
SUM(amount) AS bcf,
SUM(CASE WHEN amount > 0 THEN amount) AS debit,
SUM(CASE WHEN amount < 0 THEN amount) AS credit,
FROM ca_GLMainTrx
WHERE doc_date >= TO_DATE('2016-01-01')
AND doc_date < TO_DATE('2016-02-01')) Transactions
ON Transactions.code = Chart.code
WHERE Chart.code IS NOT NULL
ORDER BY TO_NUMBER(Chart.code), TO_NUMBER(Chart.levelP)
If you only need a few codes, it may yield better results to push those values into the subqueries as well (although note that the optimizer is likely to perform this for you).
It may be possible to remove the calls to TO_NUMBER(...) from the ORDER BY clause; however, this depends on the format of the values, since how they were encoded may change the ordering of results.
I have a table with data every hour:
and a table with data every 30 minutes:
I would like to join these two tables (by date) to get batVolt and TA in the same table and repeat the values for batVolt for the 30 minutes between the hour.
SELECT *
FROM HourTable t
INNER JOIN HalfHourTable ht
ON CAST(t.repDate AS Date) = CAST(ht.repDate AS Date)
AND DATEPART(HOUR, t.repDate) = DATEPART(HOUR, ht.repDate)
Edit
Your query should be
SELECT n.repDate
, n.TA
, a.batVolt
FROM [DAP].[dbo].[ARRMet] AS n
FULL JOIN [DAP].[dbo].[array3] AS a
ON DATEPART(HOUR, n.repDate) = DATEPART(HOUR, a.repDate)
AND CAST(n.repDate AS DATE) = CAST(a.repDate AS DATE)
WHERE CAST(n.repDate AS DATE) = '20150831'
ORDER BY n.repDate DESC
I would do this slightly differently than M.Ali. This uses fewer functions and seems a bit simpler to me.
SELECT *
FROM HourTable t
INNER JOIN HalfHourTable ht on
t.repDate = dateadd(hour, datediff(hour, 0, ht.repDate), 0)
You could use DATEPART()
select ta.repDate, ta.code, ta.TA, bat.batVolt
from table2 ta
join table 1 bat
on (DATEPART(yyyymmddhh,bat.repDate) = DATEPART(yyyymmddhh, ta.repDate)
I don't remember exact syntax for datepart but should be something similar to this
You can try following
SELECT x.*, y.*
FROM BatVoltDetails x
INNER JOIN TADetails y
ON LEFT(CONVERT(VARCHAR, x.repDate, 120), 13) = LEFT(CONVERT(VARCHAR, y.repDate, 120), 13)
We've got a query that is taking a very long time to complete with a large dataset. I think I've tracked it down to a table-value function in the SQL server.
The query is designed to return the difference in printing usage between two dates. So if a printer had usage of 100 at date x and 200 at date y a row needs to be returned which reflects that it has had a usage change of 100.
These readings are taken periodically (but not every day) and stored in a table called MeterReadings. The code for the table-value function is below. This is then called from another SQL query which joins the returned table on a devices table with an inner join to get extra device information.
Any advise as to how to optimise the below would be appreciated.
ALTER FUNCTION [dbo].[DeviceUsage]
-- Add the parameters for the stored procedure here
( #StartDate DateTime , #EndDate DateTime )
RETURNS table
AS
RETURN
(
SELECT MAX(dbo.MeterReadings.ScanDateTime) AS MX,
MAX(dbo.MeterReadings.DeviceTotal - reading.DeviceTotal) AS TotalDiff,
MAX(dbo.MeterReadings.TotalCopy - reading.TotalCopy) AS CopyDiff,
MAX(dbo.MeterReadings.TotalPrint - reading.TotalPrint) AS PrintDiff,
MAX(dbo.MeterReadings.TotalScan - reading.TotalScan) AS ScanDiff,
MAX(dbo.MeterReadings.TotalFax - reading.TotalFax) AS FaxDiff,
MAX(dbo.MeterReadings.TotalMono - reading.TotalMono) AS MonoDiff,
MAX(dbo.MeterReadings.TotalColour - reading.TotalColour) AS ColourDiff,
MIN(reading.ScanDateTime) AS MN, dbo.MeterReadings.DeviceID
FROM dbo.MeterReadings INNER JOIN (SELECT * FROM dbo.MeterReadings WHERE
(dbo.MeterReadings.ScanDateTime > #StartDate) AND
(dbo.MeterReadings.ScanDateTime < #EndDate) )
AS reading ON dbo.MeterReadings.DeviceID = reading.DeviceID
WHERE (dbo.MeterReadings.ScanDateTime > #StartDate) AND (dbo.MeterReadings.ScanDateTime < #EndDate)
GROUP BY dbo.MeterReadings.DeviceID);
On the assumption that a value can only ever increase over time, it can certainly be simplified.
SELECT
DeviceID,
MIN(ScanDateTime) AS MN,
MAX(ScanDateTime) AS MX,
MAX(DeviceTotal ) - MIN(DeviceTotal) AS TotalDiff,
MAX(TotalCopy ) - MIN(TotalCopy ) AS CopyDiff,
MAX(TotalPrint ) - MIN(TotalPrint ) AS PrintDiff,
MAX(TotalScan ) - MIN(TotalScan ) AS ScanDiff,
MAX(TotalFax ) - MIN(TotalFax ) AS FaxDiff,
MAX(TotalMono ) - MIN(TotalMono ) AS MonoDiff,
MAX(TotalColour ) - MIN(TotalColour) AS ColourDiff
FROM
dbo.MeterReadings
WHERE
ScanDateTime > #StartDate
AND ScanDateTime < #EndDate
GROUP BY
DeviceID
This assumes that if you have reading on dates 1, 3, 5, 7, 9 and you want to report on 2 -> 8 then you want reading 7 - reading 3. I would have thought you wanted reading 7 - reading 1?
The above query should be fine for relatively small ranges. If you have Huge ranges of time, the MAX() - MIN() will be operating on large numbers of rows. This can then possibly be improved even further with the following (with correlated sub-queries to lookup just the two rows that you want).
As a side benefit, this also works even if the values can go down as well as up.
(I assume the existance of a Device table for a simpler query and faster performance.)
SELECT
Device.DeviceID,
start.ScanDateTime AS MN,
finish.ScanDateTime AS MX,
finish.DeviceTotal - start.DeviceTotal AS TotalDiff,
finish.TotalCopy - start.TotalCopy AS CopyDiff,
finish.TotalPrint - start.TotalPrint AS PrintDiff,
finish.TotalScan - start.TotalScan AS ScanDiff,
finish.TotalFax - start.TotalFax AS FaxDiff,
finish.TotalMono - start.TotalMono AS MonoDiff,
finish.TotalColour - start.TotalColour AS ColourDiff
FROM
dbo.Device AS device
INNER JOIN
dbo.MeterReadings AS start
ON start.DeviceID = device.DeviceID
AND start.ScanDateTime = (SELECT MIN(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime > #startDate
AND ScanDateTime < #endDate)
INNER JOIN
dbo.MeterReadings AS finish
ON finish.DeviceID = device.DeviceID
AND finish.ScanDateTime = (SELECT MAX(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime > #startDate
AND ScanDateTime < #endDate)
This can also be modified to pick up the start as being the first date on or before #startDate, if required.
EDIT: Modification to pick the start reading as being for the first date on or before #startDate.
SELECT
Device.DeviceID,
start.ScanDateTime AS MN,
finish.ScanDateTime AS MX,
COALESCE(finish.DeviceTotal, 0) - COALESCE(start.DeviceTotal, 0) AS TotalDiff,
COALESCE(finish.TotalCopy , 0) - COALESCE(start.TotalCopy , 0) AS CopyDiff,
COALESCE(finish.TotalPrint , 0) - COALESCE(start.TotalPrint , 0) AS PrintDiff,
COALESCE(finish.TotalScan , 0) - COALESCE(start.TotalScan , 0) AS ScanDiff,
COALESCE(finish.TotalFax , 0) - COALESCE(start.TotalFax , 0) AS FaxDiff,
COALESCE(finish.TotalMono , 0) - COALESCE(start.TotalMono , 0) AS MonoDiff,
COALESCE(finish.TotalColour, 0) - COALESCE(start.TotalColour, 0) AS ColourDiff
FROM
dbo.Device AS device
LEFT JOIN
dbo.MeterReadings AS start
ON start.DeviceID = device.DeviceID
AND start.ScanDateTime = (SELECT MAX(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime < #startDate)
LEFT JOIN
dbo.MeterReadings AS finish
ON finish.DeviceID = device.DeviceID
AND finish.ScanDateTime = (SELECT MAX(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime < #endDate)
Your query seems to compute a cross-product of all readings in a time range for each particular device. This works semantically because the MIN and MAX aggregates don't care about duplicates. But this is very slow. If you are comparing 100 dates with themselves you need to process 10,000 rows.
I suggest you calculate the MIN and MAX values for each metric/column over the entire time interval and then subtract them. That way you don't need to join and you need a single pass ofer the data. Like this:
select Diff = MAX(col) - MIN(col)
from readings
group by DeviceID
I wrote this code that find the down nodes and calculate the up and down hours. This code works but I want to know any other way or optimize of this code? what is the best way to calculate the duration of down time? and Is there any way(interactive way) that user can input the date and time interval?
SELECT q1.nodeid, q1.VendorIcon, q1.Caption, q1.IP_Address,
q1.OutageDurationInMinutes,
q2.TimeUp
FROM
(SELECT
Nodes.NodeID AS NodeID, ltrim(rtrim(Nodes.Caption)) Caption, Nodes.VendorIcon,Nodes.IP_Address,
sum(DATEDIFF(hh, StartTime.EventTime, EndTime.EventTime)) as OutageDurationInMinutes
FROM Events StartTime
Left join Events EndTime On
EndTime.EventType = '5' and
EndTime.NetObjectType = 'N' and
EndTime.NetworkNode = StartTime.NetworkNode and
EndTime.EventTime =
(
Select
min(EventTime)
from Events
where
EventTime>StartTime.EventTime and
EventType = '5' and
NetObjectType = 'N' and
NetworkNode = StartTime.NetworkNode
)
INNER JOIN Nodes ON
StartTime.NetworkNode = Nodes.NodeID
WHERE
Nodes.Department = '4' AND
StartTime.EventType = 1 AND
StartTime.NetObjectType = 'N' AND
StartTime.eventtime between dateadd(M, -1, getdate()) and getdate()
Group by
Nodes.NodeID,Nodes.Caption, Nodes.VendorIcon,Nodes.IP_Address, Nodes.LastBoot
) q1
INNER JOIN
(SELECT
Nodes.NodeID AS NodeID
,ltrim(rtrim(Caption)) Caption
,VendorIcon
,Ip_Address
,DateDiff(hour,Nodes.LastBoot,GetDate()) AS HoursUp
,CONVERT(VARCHAR(40), DATEDIFF(minute, Nodes.LastBoot, GETDATE())/(24*60))
+ ' days, '
+ CONVERT(VARCHAR(40), DATEDIFF(minute, Nodes.LastBoot, GETDATE())%(24*60)/60)
+ ' hours, and '
+ CONVERT(VARCHAR(40), DATEDIFF(minute, Nodes.LastBoot, GETDATE())%60)
+ ' minutes.' AS TimeUp
FROM [Nodes]
Where
LastBoot between dateadd(day, -30, getdate()) and getdate()) q2 on q1.NodeID=q2.NodeID
Order by Caption
I want to know any other way or
optimize of this code?
I would recommend having a look at the query execution plan for you query.
Is there any way(interactive way) that
user can input the date and time
interval?
You could just determine the values before you run the query and use these parameters in your query (I'm not sure what's calling your query though, is it a stored procedure?)