Calculating AVG for NULL values from all previous rows - sql

I have a table with 4 columns like this: EmployeeID, Date, StartTime, EndTime. First two columns are not nullable, but others 2 are.
I want to generate a report and fill the missing StartTime and EndTime with AVG value of the previous rows. I'm using the following statement for the StartTime column:
ISNULL([StartTime], DATEADD(SECOND, AVG([dbo].[GetTimeInSecondsFromDateTime]([StartTime])) OVER (PARTITION BY [EmployeeID] ORDER BY [Date]), [Date]))
The problem is, when i have 2 NULL values one after another, they get the same value (AVG from all the previous ones) and what i need is: in the calculation for the second NULL value, the previous one to be included too (which is calculated) ... the thing is, i have no idea how to implement it.

The query is not tested. I Hope it helps
Because of the null values, I suggest you to first Update By StartTime
UPDATE t1
SET StartTime = ISNULL(StartTime, t2.AvgStartTime)
FROM yourTable t1
JOIN (
SELECT
EmployeeID,
Date,
Avg(StartTime) OVER(ORDER BY Date Asc) As AvgStartTime,
FROM yourTAble
) t2 ON t1.EmployeeID = t2.EmployeeID
Where
t1.StartTime is null
then for endTime
UPDATE t1
SET EndTime = ISNULL(EndTime, t2.AvgEndTime)
FROM yourTable t1
JOIN (
SELECT
EmployeeID,
Date,
Avg(EndTime) OVER(ORDER BY Date Asc) As AvgEndTime,
FROM yourTAble
) t2 ON t1.EmployeeID = t2.EmployeeID
Where
t1.EndTime is null

Related

Date Difference between consecutive rows adding additional columns

Say I added a Cost Difference column to the second table from Rishal (see the below link for this previous post), how would I also calculate and display that?
Using just the 1001 Account Number and adding the following amounts of ID1=$10, ID4=$33 and ID6=$50 to the first table, how would I display in Rishal's second table a result of $23 and $17 in addition to the other 3 columns that are already there?
I've used this code (from GarethD) and would like to insert my Cost Difference column within this...Thanks in advance,
SELECT ID,
AccountNumber,
Date,
NextDate,
DATEDIFF("D", Date, NextDate)
FROM ( SELECT ID,
AccountNumber,
Date,
( SELECT MIN(Date)
FROM YourTable T2
WHERE T2.Accountnumber = T1.AccountNumber
AND T2.Date > T1.Date
) AS NextDate
FROM YourTable T1
) AS T
Date Difference between consecutive rows
I would recommend using JOIN to bring in the entire next record:
SELECT T.*, DATEDIFF("D", t.Date, t.NextDate) as datediff,
TNext.Amount, (Tnext.Amount - T.Amount) as amountdiff
FROM (SELECT T1.*,
(SELECT MIN(Date)
FROM YourTable T2
WHERE T2.Accountnumber = T1.AccountNumber AND
T2.Date > T1.Date
) AS NextDate
FROM YourTable as T1
) AS T LEFT JOIN
YourTable as Tnext
ON t.Accountnumber = tnext.Accountnumber AND t.Date = tnext.Accountnumber;

Ignoring Duplicate Records SQL

In need of some help :)
So I have a table of records with the following columns:
Key (PK, FK, int) DT (smalldatetime) Value (real)
The DT is a datetime for every half hour of the day with an associated value
E.g.
Key DT VALUE
1000 2010-01-01 08:00:00 80
1000 2010-01-01 08:30:00 75
1000 2010-01-01 09:00:00 100
I have a Query that finds the max value every 24 hour period and its associated time however, on one day the max value occurs twice and hence duplicates the date which is causing processing issues. I have tried using rownumber() which works but I can't use a calculated column in my where clause?
Currently I have:
SELECT cast(T1.DT as date) as 'Date',Cast(T1.DT as time(0)) as 'HH', ROW_NUMBER() over (PARTITION BY cast(DT as date) ORDER BY DT) AS 'RowNumber'
FROM TABLE_1 AS T1
INNER JOIN (
SELECT CAST([DT] as date) as 'DATE'
, MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date)
) AS MAX_DT
ON MAX_DT.[DATE] = CAST(T1.[DT] as date)
AND T1.VALUE = MAX_DT.MAX_HH
WHERE DT > '6-nov-2016' and [KEY] = '1000'
ORDER BY DT
This results in
Key DT VALUE HH
1000 2010-01-01 80 07:00:00
1000 2010-02-01 100 17:30:00
1000 2010-02-01 100 18:00:00
I need to remove the duplicate date (I Have no preference which HH it takes)
I think I've explained that terribly, let me know if it makes no sense and i'll try and re write
Any ideas?
Can you try this the new code is in ** **:
SELECT cast(T1.DT as date) as 'Date', ** MIN(Cast(T1.DT as time(0))) as 'HH' **
FROM TABLE_1 AS T1
INNER JOIN (
SELECT CAST([DT] as date) as 'DATE'
, MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date)
) AS MAX_DT
ON MAX_DT.[DATE] = CAST(T1.[DT] as date)
AND T1.VALUE = MAX_DT.MAX_HH
WHERE DT > '6-nov-2016' and [KEY] = '1000'
here put the group by
GROUP BY cast(T1.DT as date)
ORDER BY DT
i would do something like this
i didnt try it but i think it s correct.
SELECT cast(T1.DT as date) as 'Date',Cast(T1.DT as time(0)) as 'HH', VALUE
FROM TABLE_1 T1
WHERE [DT] IN (
--select the max date from Table_1 for each day
SELECT MAX([DT]) max_date FROM TABLE_1
WHERE (CAST([DT] as date) ,value) IN
(
SELECT CAST([DT] as date) as 'CAST_DATE'
,MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date
)group by [DT]
)
WHERE DT > '6-nov-2016' and [KEY] = '1000'
Change the JOIN to an APPLY.
The APPLY operation will allow you to limit the connected relation to just one result for each source relation.
SELECT v.[Key], cast(v.DT As Date) as "Date", v.[Value], cast(v.DT as Time(0)) as "HH"
FROM
( -- First a projection to get just the exact dates you want
SELECT DISTINCT [Key], CAST(DT as DATE) as DT
FROM Table_1
WHERE [Key] = '1000' AMD DT > '20161106'
) dates
CROSS APPLY (
-- Then use APPLY rather than JOIN to find just the exact one record you need for each date
SELECT TOP 1 *
FROM Table_1
WHERE [Key] = dates.[Key] AND cast(DT as DATE) = dates.DT ORDER BY [Value] DESC
) v
A final note: Both this query and your sample query in the question will include values from Nov 6, 2016. The query says > 2016-11-05 with an exlusive inequality, but the original was still comparing using full DateTime values, meaning there is a implied 0 as a time component. So 12:01 AM on Nov 6 is still greater than 12:00:00.001 AM on Nov 6. If you want to exclude all Nov 6 dates from the query, you either need to change this to use a time value at the end of the date, or cast to date before making that > comparison.
With SQL you can use SELECT DISTINCT,
The SELECT DISTINCT statement is used to return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.
The SELECT DISTINCT statement is used to return only distinct (different) values.

How to get value by a range of dates?

I have a table like so
And With this code I get the 5 latest values for each domainId
;WITH grp AS
(
SELECT DomainId, [Date],Passed, DatabasePerformance,ServerPerformance,
rn = ROW_NUMBER() OVER
(PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
FROM grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
WHERE rn < 7 AND t.date != g.[Date]
ORDER BY DomainId, [Date] DESC
What I Want
Well I would like to know how many tickets were sold for each of these 5 latest rows but with the following condition:
Each of these rows come with their own date which differs.
for each date I want to check how many were sold the last 15minutes AND how many were sold the last 30mns.
Example:
I get these 5 rows for each domainId
I want to extend the above with two columns, "soldTicketsLast15" and "soldTicketsLast30"
The date column contains all the dates I need and for each of these dates I want to go back 15 min and go back 30min to and get how many tickets were sold
Example:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -15, '2016-04-12 12:10:28.2270000')
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -30, '2016-04-12 12:10:28.2270000')
How can i accomplish this?
I'd use OUTER APPLY or CROSS APPLY.
;WITH grp AS
(
SELECT
DomainId, [Date], Passed, DatabasePerformance, ServerPerformance,
rn = ROW_NUMBER() OVER (PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT
g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
,A15.SoldTicketsLast15
,A30.SoldTicketsLast30
FROM
grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast15
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -15, g.[Date])
) AS A15
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast30
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -30, g.[Date])
) AS A30
WHERE
rn < 7
AND T.[date] != g.[Date]
ORDER BY DomainId, [Date] DESC;
To make the correlated APPLY queries efficient there should be an appropriate index, like the following:
CREATE NONCLUSTERED INDEX [IX_DomainId_Date] ON [dbo].[DomainDetailDataHistory]
(
[DomainId] ASC,
[Date] ASC
)
INCLUDE ([SoldTickets])
This index may also help to make the main part of your query (grp) efficient.
If I understood your question correctly, you want to get the tickets sold from one of your dates (in the Date column) going back 15 minutes and 30 minutes. Assuming that you are using your DATEADD function correctly, the following should work:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] BETWEEN [DATE] AND DATEADD(minute, -15, '2016-04-12 12:10:28.2270000') GROUP BY [SoldTickets]
The between operator allows you to retrieve results between two date parameters. In the SQL above, we also need a group by since you are using a GROUPING function (MAX). The group by would depend on what you want to group by but I think in your case it would be SoldTickets.
The SQL above will give you the ones between the date and 15 minutes back. You could do something similar with the 30 minutes back.

Remove all duplicated records from a resultset(remove both)

I have a result set in generated as CTE using Union that contains duplicate records. as in image below:
And the query is:
WITH CTE (StartTime ,EndTime )
AS
(
SELECT StartTime ,EndTime, Null as Exclude, SupplierId FROM cms.TimeSlotMaster
WHERE Monday = 1 AND SupplierID IS NULL
UNION
SELECT StartTime ,EndTime FROM cms.TimeSlotOverRider
WHERE SupplierID IS NULL
AND StartDate <= cast(GETDATE() as DATE) AND EndDate >= cast(GETDATE() as DATE)
)
Now I am trying to remove the duplicate results from this result set at all. So finally the results set should be only 2 rows. So it should look like below:
Any help would be appreciated. Thanks.
For more information the first result set is generated using below CTE
You can use NOT EXISTS:
SELECT t.*
FROM dbo.TableName t
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.TableName t2
WHERE t. ID <> t2.ID
AND t.StartTime = t2.StartTime
AND t.EndTime = t2.EndTime
)
or - if you don't have a primary key in this table:
WITH CTE AS
(
SELECT t.*, cnt = COUNT(*) OVER (PARTITION BY StartTime, EndTime)
FROM dbo.TableName t
)
SELECT StartTime, EndTime
FROM CTE
WHERE cnt = 1

Oracle - select rows with minimal value in a subset

I have a following table of dates:
dateID INT (PK),
personID INT (FK),
date DATE,
starttime VARCHAR, --Always in a format of 'HH:MM'
What I want to do is I want to pull rows (all columns, including PK) with lowest date (primary condition) and starttime (secondary condition) for every person. For example, if we have
row1(date = '2013-04-01' and starttime = '14:00')
and
row2(date = '2013-04-02' and starttime = '08:00')
row1 will be retrieved, along with all other columns.
So far I have come up with gradual filtering the table, but it`s quite a mess. Is there more efficient way of doing this?
Here is what I made so far:
SELECT
D.id
, D.personid
, D.date
, D.starttime
FROM table D
JOIN (
SELECT --Select lowest time from the subset of lowest dates
A.personid,
B.startdate,
MIN(A.starttime) AS starttime
FROM table A
JOIN (
SELECT --Select lowest date for every person to exclude them from outer table
personid
, MIN(date) AS startdate
FROM table
GROUP BY personid
) B
ON A.personid = B.peronid
AND A.date = B.startdate
GROUP BY
A.personid,
B.startdate
) C
ON C.personid = D.personid
AND C.startdate = D.date
AND C.starttime = D.starttime
It works, but I think there is a more clean/efficient way to do this. Any ideas?
EDIT: Let me expand a question - I also need to extract maximum date (only date, without time) for each person.
The result should look like this:
id
personid
max(date) for each person
min(date) for each person
min(starttime) for min(date) for each person
It is a part of a much larger query (the resulting table is joined with it), and the resulting table must be lightweight enough so that the query won`t execute for too long. With single join with this table (just using min, max for each field I wanted) the query took about 3 seconds, and I would like the resulting query not to take longer than 2-3 times that.
you should be able to do this like:
select a.dateID, a.personID, a.date, a.max_date, a.starttime
from (select t.*,
max(t.date) over (partition by t.personID) max_date,
row_number() over (partition by t.personID
order by t.date, t.starttime) rn
from table t) a
where a.rn = 1;
sample data added to fiddle: http://sqlfiddle.com/#!4/63c45/1
This is the query you can use and no need to incorporate in your query. You can also use #Dazzal's query as stand alone
SELECT ID, PERSONID, DATE, STARTTIME
(
SELECT ID, PERONID, DATE, STARTTIME, ROW_NUMBER() OVER(PARTITION BY personid ORDER BY STARTTIME, DATE) AS RN
FROM TABLE
) A
WHERE
RN = 1
select a.id,a.accomp, a.accomp_name, a.start_year,a.end_year, a.company
from (select t.*,
min(t.start_year) over (partition by t.company) min_date,
max(t.end_year) over (partition by t.company) max_date,
row_number() over (partition by t.company
order by t.end_year desc) rn
from temp_123 t) a
where a.rn = 1;