Find multiple most recent dates before a given date efficiently? - sql

The following query takes 1.5s and because I need to run it several thousands times, I would like to optimize it. Basically I try to find the first date less than or equal to an array of provided dates (e.g. ['2016-01-01', '2017-01-01', '2018-01-01']). Now I'm doing each date individually:
SELECT date FROM date_history
WHERE ticker = 'APPL' AND date <= %(date)
ORDER BY date DESC LIMIT 1;
I feel as though it might be faster if I could reuse the date sorting or something under those lines but I can't think of a good way to do this. Any suggestions on how to make this faster would be appreciated!

You could use ROW_NUMBER:
WITH cte(d) AS (
VALUES ('2016-01-01'::date)
,('2017-01-01'::date)
,('2018-01-01'::date)
--Or unnest array_variable WITH ORDINALITY
), cte2 AS (
SELECT d.date, c.d,
ROW_NUMBER() OVER(PARTITION BY c.d ORDER BY d.date DESC) AS rn
FROM cte c
LEFT JOIN date_history d
ON d.date <= c.d
WHERE d.ticker = 'APPL'
)
SELECT c.d, d.date AS max_date_before
FROM cte2
WHERE rn = 1
ORDER BY c.d ASC;
Alternatively LEFT JOIN LATERAL and correlated subquery:
WITH cte(d) AS (
VALUES ('2016-01-01'::date)
,('2017-01-01'::date)
,('2018-01-01'::date)
--Or unnest array_variable WITH ORDINALITY
)
SELECT *
FROM cte c,
LEFT JOIN LATERAL (SELECT MAX(date) AS max_date_before
FROM date_history d
WHERE d.ticker = 'APPL'
AND d.date <= c.d) s;

Related

How to extrapolate dates in SQL Server to calculate the daily counts?

This is how the data looks like. It's a long table
I need to calculate the number of people employed by day
How to write SQL Server logic to get this result? I treid to create a DATES table and then join, but this caused an error because the table is too big. Do I need a recursive logic?
For future questions, don't post images of data. Instead, use a service like dbfiddle. I'll anyhow add a sketch for an answer, with a better-prepared question you could have gotten a complete answer. Anyhow here it goes:
-- extrema is the least and the greatest date in staff table
with extrema(mn, mx) as (
select least(min(hired),min(retired)) as mn
, greatest(max(hired),max(retired)) as mx
from staff
), calendar (dt) as (
-- we construct a calendar with every date between extreme values
select mn from extrema
union all
select dateadd(day, 1, d)
from calendar
where dt < (select mx from extrema)
)
-- finally we can count the number of employed people for each such date
select dt, count(1)
from calendar c
join staff s
on c.dt between s.hired and s.retired
group by dt;
If you find yourself doing this kind of calculation often, it is a good idea to create a calendar table. You can add other attributes to it such as if it is a day of in the middle of the week etc.
With a constraint as:
CHECK(hired <= retired)
the first part can be simplified to:
with extrema(mn, mx) as (
select min(hired) as mn
, max(retired) as mx
from staff
),
Assuming Current Employees have a NULL retirement date
Declare #Date1 date = '2015-01-01'
Declare #Date2 date = getdate()
Select A.Date
,HeadCount = count(B.name)
From ( Select Top (DateDiff(DAY,#Date1,#Date2)+1)
Date=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),#Date1)
From master..spt_values n1,master..spt_values n2
) A
Left Join YourTable B on A.Date >= B.Hired and A.Date <= coalesce(B.Retired,getdate())
Group BY A.Date
You need a calendar table for this. You start with the calendar, and LEFT JOIN everything else, using BETWEEN logic.
You can use a real table. Or you can generate it on the fly, like this:
WITH
L0 AS ( SELECT c = 1
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT c = 1 FROM L0 A, L0 B, L0 C, L0 D ),
Nums AS ( SELECT rownum = ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM L1 ),
Dates AS (
SELECT TOP (DATEDIFF(day, '20141231', GETDATE()))
Date = DATEADD(day, rownum, '20141231')
FROM Nums
)
SELECT
d.Date,
NumEmployed = COUNT(*)
FROM Dates d
JOIN YourTable t ON d.Date BETWEEN t.Hired AND t.Retired
GROUP BY
d.Date;
If your dates have a time component then you need to use >= AND < logic
Try limiting the scope of your date table. In this example I have a table of dates named TallyStickDT.
SELECT dt, COUNT(name)
FROM (
SELECT dt
FROM tallystickdt
WHERE dt >= (SELECT MIN(hired) FROM #employees)
AND dt <= GETDATE()
) A
LEFT OUTER JOIN #employees E ON A.dt >= E.Hired AND A.dt <= e.retired
GROUP BY dt
ORDER BY dt

Oracle query to fill in the missing data in the same table

I have a table in oracle which has missing data for a given id. I am trying to figure out the sql to fill in the data from start date: 01/01/2019 to end_dt: 10/1/2020. see the input data below. for status key the data can be filled based on its previous status key. see input:
expected output:
You can use a recursive query to generate the dates, then cross join that with the list of distinct ids available in the table. Then, use window functions to bring the missing key values:
with recursive cte (mon) as (
select date '2019-01-01' mon from dual
union all select add_months(mon, 1) from cte where mon < date '2020-10-01'
)
select i.id,
coalesce(
t.status_key,
lead(t.previous_status_key ignore nulls) over(partition by id order by c.mon)
) as status_key,
coalesce(
t.status_key,
lag(t.status_key ignore nulls, 1, -1) over(partition by id order by c.mon)
) previous_status_key,
c.mon
from cte c
cross join (select distinct id from mytable) i
left join mytable t on t.mon = c.mon and t.id = i.id
You did not give a lot of details on how to bring the missing status_keys and previous_status_keys. Here is what the query does:
status_key is taken from the next non-null previous_status_key
previous_status_key is taken from the last non-null status_key, with a default of -1
You can generate the dates and then use cross join and some additional logic to get the information you want:
with dates (mon) as (
select date '2019-01-01' as mon
from dual
union all
select mon + interval '1' month
from dates
where mon < date '2021-01-01'
)
select d.mon, i.id,
coalesce(t.status_key,
lag(t.status_key ignore nulls) over (partition by i.id order by d.mon)
) as status_key,
coalesce(t.previous_status_key,
lag(t.previous_status_key ignore nulls) over (partition by i.id order by d.mon)
) as previous_status_key
from dates d cross join
(select distinct id from t) i left join
t
on d.mon = t.mon and i.id = i.id;

How to get value by a range of dates?

I have a table like so
And With this code I get the 5 latest values for each domainId
;WITH grp AS
(
SELECT DomainId, [Date],Passed, DatabasePerformance,ServerPerformance,
rn = ROW_NUMBER() OVER
(PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
FROM grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
WHERE rn < 7 AND t.date != g.[Date]
ORDER BY DomainId, [Date] DESC
What I Want
Well I would like to know how many tickets were sold for each of these 5 latest rows but with the following condition:
Each of these rows come with their own date which differs.
for each date I want to check how many were sold the last 15minutes AND how many were sold the last 30mns.
Example:
I get these 5 rows for each domainId
I want to extend the above with two columns, "soldTicketsLast15" and "soldTicketsLast30"
The date column contains all the dates I need and for each of these dates I want to go back 15 min and go back 30min to and get how many tickets were sold
Example:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -15, '2016-04-12 12:10:28.2270000')
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -30, '2016-04-12 12:10:28.2270000')
How can i accomplish this?
I'd use OUTER APPLY or CROSS APPLY.
;WITH grp AS
(
SELECT
DomainId, [Date], Passed, DatabasePerformance, ServerPerformance,
rn = ROW_NUMBER() OVER (PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT
g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
,A15.SoldTicketsLast15
,A30.SoldTicketsLast30
FROM
grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast15
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -15, g.[Date])
) AS A15
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast30
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -30, g.[Date])
) AS A30
WHERE
rn < 7
AND T.[date] != g.[Date]
ORDER BY DomainId, [Date] DESC;
To make the correlated APPLY queries efficient there should be an appropriate index, like the following:
CREATE NONCLUSTERED INDEX [IX_DomainId_Date] ON [dbo].[DomainDetailDataHistory]
(
[DomainId] ASC,
[Date] ASC
)
INCLUDE ([SoldTickets])
This index may also help to make the main part of your query (grp) efficient.
If I understood your question correctly, you want to get the tickets sold from one of your dates (in the Date column) going back 15 minutes and 30 minutes. Assuming that you are using your DATEADD function correctly, the following should work:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] BETWEEN [DATE] AND DATEADD(minute, -15, '2016-04-12 12:10:28.2270000') GROUP BY [SoldTickets]
The between operator allows you to retrieve results between two date parameters. In the SQL above, we also need a group by since you are using a GROUPING function (MAX). The group by would depend on what you want to group by but I think in your case it would be SoldTickets.
The SQL above will give you the ones between the date and 15 minutes back. You could do something similar with the 30 minutes back.

SQL adding missing dates to query

I'm trying to add missing dates to a SQL query but it does not work.
Please can you tell me what I'm doing wrong.
I only have read only rights to database.
SQL query:
With cteDateGen AS
(
SELECT 0 as Offset, CAST(DATEADD(dd, 0, '2015-11-01') AS DATE) AS WorkDate
UNION ALL
SELECT Offset + 1, CAST(DATEADD(dd, Offset, '2015-11-05') AS DATE)
FROM cteDateGen
WHERE Offset < 100
), -- generate date from to --
cte AS (
SELECT COUNT(*) OVER() AS 'total' ,ROW_NUMBER()OVER (ORDER BY c.dt DESC) as row
, c.*
FROM clockL c
RIGHT JOIN cteDateGen d ON CAST(c.dt AS DATE) = d.WorkDate
WHERE
c.dt between '2015-11-01' AND '2015-11-05' and
--d.WorkDate BETWEEN '2015-11-01' AND '2015-11-05'
and c.id =10
) -- select user log and add missing dates --
SELECT *
FROM cte
--WHERE row BETWEEN 0 AND 15
--option (maxrecursion 0)
I think your problem is simply the dates in the CTE. You can also simplify it a bit:
With cteDateGen AS (
SELECT 0 as Offset, CAST('2015-11-01' AS DATE) AS WorkDate
UNION ALL
SELECT Offset + 1, DATEADD(day, 1, WorkDate) AS DATE)
-----------------------------------^
FROM cteDateGen
WHERE Offset < 100
), -- generate date from to --
cte AS
(SELECT COUNT(*) OVER () AS total,
ROW_NUMBER() OVER (ORDER BY c.dt DESC) as row,
c.*
FROM cteDateGen d LEFT JOIN
clockL c
ON CAST(c.dt AS DATE) = d.WorkDate AND c.id = 10
-----------------------------------------------^
WHERE d.WorkDate between '2015-11-01' AND '2015-11-05'
) -- select user log and add missing dates --
SELECT *
FROM cte
Notes:
Your query used a constant for the second date in the CTE. The constant was different from the first constant. Hence, it was missing some days.
I think that LEFT JOIN is much easier to follow than RIGHT JOIN. LEFT JOIN is basically "keep all rows in the first table".
The WHERE clause was undoing the outer join in any case. The c.id logic needs to move to the ON clause.
The date arithmetic in the first CTE was unnecessarily complex.

Hits per day in Google Big Query

I am using Google Big Query to find hits per day. Here is my query,
SELECT COUNT(*) AS Key,
DATE(EventDateUtc) AS Value
FROM [myDataSet.myTable]
WHERE .....
GROUP BY Value
ORDER BY Value DESC
LIMIT 1000;
This is working fine but it ignores the date with 0 hits. I wanna include this. I cannot create temp table in Google Big Query. How to fix this.
Tested getting error Field 'day' not found.
SELECT COUNT(*) AS Key,
DATE(t.day) AS Value from (
select date(date_add(day, i, "DAY")) day
from (select '2015-05-01 00:00' day) a
cross join
(select
position(
split(
rpad('', datediff(CURRENT_TIMESTAMP(),'2015-05-01 00:00')*2, 'a,'))) i
from (select NULL)) b
) d
left join [sample_data.requests] t on d.day = t.day
GROUP BY Value
ORDER BY Value DESC
LIMIT 1000;
You can query data that exists in your tables, the query cannot guess which dates are missing from your table. This problem you need to handle either in your programming language, or you could join with a numbers table and generates the dates on the fly.
If you know the date range you have in your query, you can generate the days:
select date(date_add(day, i, "DAY")) day
from (select '2015-01-01' day) a
cross join
(select
position(
split(
rpad('', datediff('2015-01-15','2015-01-01')*2, 'a,'))) i
from (select NULL)) b;
Then you can join this result with your query table:
SELECT COUNT(*) AS Key,
DATE(t.day) AS Value from (...the.above.query.pasted.here...) d
left join [myDataSet.myTable] t on d.day = t.day
WHERE .....
GROUP BY Value
ORDER BY Value DESC
LIMIT 1000;