Optimize Query Join Between Dates - sql

3 tables
WorkRecordfact - has workdate(date) - ~300000 rows
EmployeeStatus - Startdate(date), EndDate(date), PositionID - 450 Rows
Positions - PositionID, PositionCode - 10 rows
Queries that look for data in WorkRecordFact filtering by position are taking a long time. Basic sample Query
SELECT workrecordfact.*
FROM workrecordfact
INNER JOIN Employeestatus on
Employeestatus.EmployeeID = workrecordfact.EmployeeID and
employeestatus.startdate <= workrecordfact.workdate and
employeestatus.enddate >= workrecordfact.workdate
INNER JOIN Positions on
employeestatus.PositionID = positions.PositionID
Where workrecordfact.workdate >= '20180601'
and workrecordfact.workdate <= '20180930'
and PositionCode = 'CSR'
Workrecordfact has a clustered index on Workdate
Employeestatus has 4 indexes
EmployeeID
EmployeeID+ StartDate
EmployeeID+ EndDate
EmployeeID+ StartDate + EndDate
In the query Statistics I'm seeing a lot of 500% elements. Starting with a Clustered Index Seek on the WorkRecordFact index. Some numbers that stand out
Estimated Number of Rows 250
Estimated Number of Rows to be Read 667
Number of Executions 381
Number of Rows Read 49525952??!?!?
Actual Number of Rows 112018
Results are taking long enough that the .net app sending the query is receiving a timeout in some cases.
I've rebuild/reorganized fragmented indexes, and refreshed statistics but that's not solved the issue.
Any Ideas?
UPDATE: It seems the query is running quite well from SMSS and only timing out from the application. Dates are passed in as parameters BTW, currently investigating possible issues with parameter sniffing :-/

Since workrecordfact has lot more records than EmployeeStatus and is a date range query.
First indexing them is always a pain.
Secondly Drop indexes on Employee Status they are of no use with regard to this query in my opinion.
EmployeeID
EmployeeID+ StartDate
EmployeeID+ EndDate
EmployeeID+ StartDate + EndDate
Create one more for EmployeeID on the fact table.
I think that should help.

The name of the game is to limit IO on the workrecordfact table. The way the query and indexing are currently set up, we're scanning 100% of the work in the third quarter, then later filtering that down to just the work of the CSRs. I wonder if the CSR criteria might be more selective and get us to a lower number of rows read?
This query is pretty zippy, right?
SELECT es.EmployeeID, es.StartDate, es.EndDate
FROM Positions p
JOIN Employeestatus es
ON p.PositionID = es.PositionID
WHERE PositionCode = 'CSR'
'CSR' probably matches exactly one Positions row. Assuming positions do approximately equal work, we probably only need to read 10% of the Q3 portion of the work table.
Thinking about bringing in the work facts afterwards like this.
SELECT w.*
FROM
(
SELECT es.EmployeeID, es.StartDate, es.EndDate
FROM Positions p
JOIN Employeestatus es
ON p.PositionID = es.PositionID
WHERE PositionCode = 'CSR'
) es2
JOIN workrecordfact w
ON es2.EmployeeID = w.EmployeeID
AND es2.startdate <= w.workdate AND w.workdate <= es2.enddate
WHERE
'2018-06-01' <= w.workdate AND w.workdate <= '2018-09-30'
This query would be best supported by this index:
CREATE INDEX WorkRecordFact_EmployeeId_WorkDate ON WorkRecordFact(EmployeeId, WorkDate)
Moving the conditional choice of workdate logically earlier into the query might be helpful:
SELECT w.*
FROM
(
SELECT es.EmployeeID,
CASE WHEN es.StartDate <= '2018-06-01' THEN '2018-06-01' ELSE es.StartDate END as StartDate,
CASE WHEN '2018-09-30' <= es.EndDate THEN '2018-09-30' ELSE es.EndDate END as EndDate
FROM Positions p
JOIN Employeestatus es
ON p.PositionID = es.PositionID
WHERE PositionCode = 'CSR'
) es2
JOIN workrecordfact w
ON es2.EmployeeID = w.EmployeeID
AND es2.startdate <= w.workdate AND w.workdate <= es2.enddate

Related

Teradata spool space issue on running a sub query with Count

I am using below query to calculate business days between two dates for all the order numbers. Business days are already available in the teradata table Common_WorkingCalendar. But, i'm also facing spool space issue while i execute the query. I have ample space available in my data lab. Need to optimize the query. Appreciate any inputs.
SELECT
tx."OrderNumber",
(SELECT COUNT(1) FROM Common_WorkingCalendar
WHERE CalDate between Cast(tx."TimeStamp" as date) and Cast(mf.ShipDate as date)) as BusDays
from StoreFulfillment ff
inner join StoreTransmission tx
on tx.OrderNumber = ff.OrderNumber
inner join StoreMerchandiseFulfillment mf
on mf.OrderNumber = ff.OrderNumber
This is a very inefficient way to get this count which results in a product join.
The recommended approach is adding a sequential number to your calendar which increases only on business days (calculated using SUM(CASE WHEN businessDay THEN 1 ELSE 0 END) OVER (ORDER BY CalDate ROWS UNBOUNDED PRECEDING)), then it's two joins, for the start date and the end date.
If this calculation is needed a lot you better add a new column, otherwise you can do it on the fly:
WITH cte AS
(
SELECT CalDate,
-- as this table only contains business days you can use this instead
row_number(*) Over (ORDER BY CalDate) AS DayNo
FROM Common_WorkingCalendar
)
SELECT
tx."OrderNumber",
to_dt.DayNo - from_dt.DayNo AS BusDays
FROM StoreFulfillment ff
INNER JOIN StoreTransmission tx
ON tx.OrderNumber = ff.OrderNumber
INNER JOIN StoreMerchandiseFulfillment mf
ON mf.OrderNumber = ff.OrderNumber
JOIN cte AS from_dt
ON from_dt.CalDate = Cast(tx."TimeStamp" AS DATE)
JOIN cte AS to_dt
ON to_dt.CalDate = Cast(mf.ShipDate AS DATE)

Optimize slow running count query

I am trying to summarize banner ad views from a table that is pretty good size(18,243,847 rows). I need a count for views in the past two years. I tried adding an index to the date and tried different variations of the query below. Most runs are about 25 seconds, however it seems when passed in a web service, the target page is timing out. I know the issue is with count, but have not been able to reduce that portion to lower than 11 seconds. Seems like not a lot, but why the issue with my web service? Anyway, first things first, is this query doing the best I can do?
SELECT ba.adID, ba.name, ba.description, ba.startDate, ba.endDate, isNull(v.viewCount,0) AS viewCount, isNull(c.clickCount,0) AS clickCount
FROM bannerAds ba
LEFT OUTER JOIN (SELECT adID, count(viewID) AS viewCount
FROM bannerAdsViews
WHERE viewDateTime IS NOT NULL AND viewDateTime >= DateAdd(yy, -2, GetDate())
GROUP BY adID) v ON ba.adID = v.adID
LEFT OUTER JOIN (SELECT adID, count(viewID) AS clickCount
FROM bannerAdsViews
WHERE clickDateTime IS NOT NULL AND viewDateTime >= DateAdd(yy, -2, GetDate())
GROUP BY adID) c ON ba.adID = c.adID
WHERE viewCount > 0
ORDER BY name ASC
FOR XML RAW ('Banner'), ROOT ('Banners');
This query can be difficult to get really good performance on. You are summarizing a lot of data.
However, two subqueries are not needed. If I make the assumption the viewID and viewDateTime are both NULL on the same records, then I think this version is equivalent:
SELECT ba.adID, ba.name, ba.description, ba.startDate, ba.endDate,
COALESCE(vc.viewCount, 0) as viewCount,
COALESCE(vc.clickCount, 0) as clickCount
FROM bannerAds ba JOIN
(SELECT adID, count(viewDateTime) as viewCount,
count(clickDateTime) as clickCount
FROM bannerAdsViews
WHERE viewDateTime >= DateAdd(year, -2, GetDate())
GROUP BY adID
) vc
ON ba.adID = v.adID
WHERE viewCount > 0
ORDER BY name ASC
FOR XML RAW ('Banner'), ROOT ('Banners');
The INNER JOIN can replace the LEFT JOIN, because the WHERE clause is removing NULL values anyway.

Counting concurrent records based on startdate and enddate columns

The table structure:
StaffingRecords
PersonnelId int
GroupId int
StaffingStartDateTime datetime
StaffingEndDateTime datetime
How can I get a list of staffing records, given a date and a group id that employees belong to, where the count of present employees fell below a threshold, say, 3, at any minute of the day?
The way my brain works, I would call a stored proc repeatedly with each minute of the day, but of course this would be horribly inefficient:
SELECT COUNT(PersonnelId)
FROM DailyRosters
WHERE GroupId=#GroupId
AND StaffingStartTime <= #TimeParam
AND StaffingEndTime > #TimeParam
AND COUNT(GroupId) < 3
GROUP BY GroupId
HAVING COUNT(PersonnelId) < 3
Edit: If it helps to refine the question, employees may come and go throughout the day. Personnel may have a staffing record from 0800 - 0815, and another from 1000 - 1045, for example.
Here is a solution where I find all of the distinct start and end times, and then query to see how many other people are clocked in at the time. Everytime the answer is less than 4, you know you are understaffed at that time, and presumably until the NEXT start time.
with meaningfulDtms(meaningfulTime, timeType, group_id)
as
(
select distinct StaffingStartTime , 'start' as timeType, group_id
from DailyRosters
union
select distinct StaffingEndTime , 'end' as timeType, group_id
from DailyRosters
)
select COUNT(*), meaningfulDtms.group_id, meaningfulDtms.meaningfulTime
from DailyRosters dr
inner join meaningfulDtms on dr.group_id = meaningfulDtms.group_id
and (
(dr.StaffingStartTime < meaningfulDtms.meaningfulTime
and dr.StaffingEndTime >= meaningfulDtms.meaningfulTime
and meaningfulDtms.timeType = 'start')
OR
(dr.StaffingStartTime <= meaningfulDtms.meaningfulTime
and dr.StaffingEndTime > meaningfulDtms.meaningfulTime
and meaningfulDtms.timeType = 'end')
)
group by meaningfulDtms.group_id, meaningfulDtms.meaningfulTime
having COUNT(*) < 4
Create a table with all minutes in the day with dt at PK
It will have 1440 rows
this will not give you count of zero - no staff
select allMiuntes.dt, worktime.grpID, count(distinct(worktime.personID))
from allMinutes
join worktime
on allMiuntes.dt > worktime.start
and allMiuntes.dt < worktime.end
group by allMiuntes.dt, worktime.grpID
having count(distinct(worktime.personID)) < 3
for times with zero I think the best way is a master of grpID
but I am not sure about this one
select allMiuntes.dt, grpMaster.grpID, count(distinct(worktime.personID))
from grpMaster
cross join allMinutes
left join worktime
on allMiuntes.dt > worktime.start
and allMiuntes.dt < worktime.end
and worktime.grpID = grpMaster.grpID
group by allMiuntes.dt, grpMaster.grpID
having count(distinct(worktime.personID)) < 3

SQL query feels inefficient - how can I improve it?

I'm using the SQL code below in SQLite to get a list of trades from a table containing trades and then combining it with total portfolio value on the day from a holdings table that has position and price data for a set of instruments.
The holdings table has about 150000 records and the trades table has about 1700
SELECT t.*, (SELECT p.adjclose FROM prices AS p
WHERE t.instrument = p.instrument
AND p.date = "2013-02-28 00:00:00") as close,
su.mv as mv
FROM trades AS t
left outer join
(SELECT h.date, SUM(h.price * h.position) as mv FROM holdings AS h
WHERE h.portfolio = "usequity"
AND h.date >= "2013-01-11 00:00:00"
AND h.date <= "2013-02-2"
GROUP BY h.date) as su
ON t.date = su.date
WHERE t.portname = "usequity"
AND t.date >= "2013-01-11 00:00:00"
AND t.date <= "2013-02-28 00:00:00";
Running the SQL code returns
[2014-12-01 19:21:00] 123 row(s) retrieved starting from 1 in 572/627 ms
Which seems really slow for a small dataset. Both tables are indexed on instrument and date.
I don't know how to index the table su on the fly so I'm not sure how to improve this code. Any help greatly appreciated.
EDIT
explain query plan shows
selectid,order,from,detail
1,0,0,"SEARCH TABLE holdings AS h USING AUTOMATIC COVERING INDEX (portfolio=?) (~7 rows)"
1,0,0,"USE TEMP B-TREE FOR GROUP BY"
0,0,0,"SCAN TABLE trades AS t (~11111 rows)"
0,1,1,"SEARCH SUBQUERY 1 AS su USING AUTOMATIC COVERING INDEX (date=?) (~3 rows)"
0,0,0,"EXECUTE CORRELATED SCALAR SUBQUERY 2"
2,0,0,"SEARCH TABLE prices AS p USING INDEX p1 (instrument=? AND date=?) (~9 rows)"
The lookup on prices is fast (it's using the index for both columns).
You could create a temporary table for the su subquery and add an index to that, but the AUTOMATIC INDEX shows that the database is already doing this.
The lookup on holdings is done with a temporary index; you should create an explicit index for that. (An index on both portfolio and date would be even more efficient.)
You could avoid the need for a temporary table by looking up the values from holdings dynamically, like you're already doing for the closing price (but this might not be an improvement if there are many trades on the same day):
SELECT t.*,
(SELECT p.adjclose
FROM prices AS p
WHERE p.instrument = t.instrument
AND p.date = '2013-02-28 00:00:00'
) AS close,
(SELECT SUM(h.price * h.position)
FROM holdings AS h
WHERE h.portfolio = 'usequity'
AND h.date = t.date
) AS mv
FROM trades AS t
WHERE t.portname = 'usequity'
AND t.date BETWEEN '2013-01-11 00:00:00'
AND '2013-02-28 00:00:00';

How to implement this logic?

I have designed a script to get the inspector performance score which is based on different factors. Inspector is awarded grades based on their performance score. Script runs over night as a SQL job and updates all the inspectors (over 6500 inspectors) grades.
We are checking last 90 days progress but many inspectors who have done no work in last 90 days are getting full marks. To avoid this situation we have decided to look at last 90 days and if the number of reports is zero go back another 90 days for that inspector.
i.e. If out 6500 inspectors lets say 250 has done no job then script needs to go back another 90 days for those 250 inspectors and see if they have any work.
This could have been implemented in cursors very easily but i can't use cursor as it is taking too long as discussed here select query in Cursor taking too long
What are the other option? Should i write a function which will first check if there is any work been done in last 90 days for one inspector if not then go back another 90 days. But for doing this i would till need cursor?
ADDED
I have tried setting dates in temp table as mentioned by #Raj but it is taking too much time. This is a same query which took so long while using cursor. Other stats are running fine and i think something to do with query.
Requirements:
Number of visits for each inspectors where visits has uploaded document (1 or 2 or 13)
Tables:
Inspectors: InspectorID
InspectionScope: ScopeID, InspectorID (FK)
Visits: VisitID, VisitDate ScopeID (FK)
VisitsDoc: DocID, DocType, VisitID (FK)
DECLARE
#DateFrom90 date, #DateTo date, #DateFrom180 date, #DateFrom date;
SELECT #DateTo = CAST(GETDATE() AS DATE)
,#DateFrom90 = CAST(GETDATE() - 90 AS DATE)
,#DateFrom180 = CAST(GETDATE() - 180 AS DATE)
DECLARE #Inspectors TABLE (
InspectorID int,
InspectorGrade int,
DateFrom date,
DateTo date
);
insert into #inspectors (
InspectorID ,
InspectorGrade,
DateFrom ,
DateTo
)
select
tmp.InspectorID , tmp.InspectorGrade
,case when tmp.VisitWithReport = 0 then #DateFrom180 else #DateFrom90 end StartDate
,#DateTo EndDate
from
(
select
i.InspectorID , i.InspectorGrade
,VisitWithReport = (select COUNT(v.visitid) from visits v
inner join InspectionScope s on s.ScopeID = v.ScopeID
where v.ReportStandard not in (0,9) and v.VisitType = 1
and v.VisitDate BETWEEN #DateFrom90 and #DateTo
and s.InspectorID = i.InspectorID)
from inspectors i
)tmp;
--select * from #Inspectors
SELECT i.InspectorID , i.InspectorGrade
,TotalVisitsWithAtLeastOneReport = (select COUNT(distinct v.visitID) from Visits v
inner join InspectionScope s on s.ScopeID = v.ScopeID
inner join VisitDocs vd on vd.VisitID = v.VisitID
where vd.DocType IN (1,2,13) and s.InspectorID = i.InspectorID
and v.VisitDate BETWEEN i.DateFrom and i.DateTo
)
from #Inspectors i
You can identify the last job/work date first before applying any logic. Like, you can store InspectorID and LastWorkDay in a temp table (assuming LastWorkDay will be available in some table). Then based on LastWorkDay you can decide how many days you have to go back - 90 or 180. This will be another field (StartDate) in temp table which can be derived based on LastWorkDay column.