How can I see point in time rolling five week counts of distinct values? - sql

I am trying to see the point in time rolling five week count of distinct employees paid. For example, in week 48 I would need to see the count of distinct employees paid in weeks 44 through 48. I think I have to include something like "WHERE Week_Number BETWEEN Week_Number -5 AND Week_Number" but am not sure how to make this work. The output should just be the Year, Week Number, and count of distinct employee IDs.
SELECT Week_Number,
Year,
Account,
count(distinct EmployeeID as 'EmployeeCount'
FROM [Table]
GROUP BY Week_Number, Year, Account

I assume that you have a data table like this:
YearNumber | WeekNumber | Account | EmployeeID
----------------------------------------------
2019 | 51 | 101 | 1
2019 | 48 | 101 | 2
And this is the result you want to see:
YearNumber | WeekNumber | Account | Quantity
----------------------------------------------
2019 | 48 | 101 | 1
2019 | 49 | 101 | 1
2019 | 50 | 101 | 1
2019 | 51 | 101 | 2
2019 | 52 | 101 | 2
2020 | 1 | 101 | 1
2020 | 2 | 101 | 1
2020 | 3 | 101 | 1
So one person starts paying on week 48, one at 51, which means their payment on account 101 overlaps on week 51 and 52, but on the other weeks, only one person pays to the account.
To also answer your question in the comment: this - I think - is a good way to provide a sample data and expected result when you ask on SO.
The query which helped me produce the results above:
SELECT
d.Year + IIF((d.Week + n.Number - 1) >= 52, 1, 0) AS Year,
(d.Week + n.Number - 1) % 52 + 1 AS Week,
d.AccountID,
COUNT(d.EmployeeID) AS Quantity
FROM Data d
CROSS APPLY (SELECT * FROM Number n WHERE Number BETWEEN 0 AND 4) n
GROUP BY
d.Year + IIF((d.Week + n.Number - 1) >= 52, 1, 0), -- Year
(d.Week + n.Number - 1) % 52 + 1, -- Week
d.AccountID
This uses a Number table which is basically a table containing the numbers - help a lot in queries like this. The code also has a minimal handling for year turning, but be aware that you may need to care for years containing 53 weeks.

Related

get average x months weekly in SQL

I need to get the average of past 6 months weekly. I tried below query but not resulting correct because average for each week am getting same,
select s.*,avg(sales) over (partition by sku_id,
substring(dateweeknum,1,4),substring(dateweeknum,6, 2) ) as avg_sum
from
(
Select sku_id,dateweeknum,
Sum(sales_data) Sales
From table
WHERE dateval>=Dateadd(Month, Datediff(Month, 0, DATEADD(m, -6,
dateval)), 0)
Group By sku_id,dateweeknum
) s
my data is like -
dateweeknum : 2020/01-04 ( it has format yr/mon-week)
dateval - 20200928 (yearmonthdate)
sample ex:
|sku_id | dateweeknum | dateval | sales_data|
|-------:-------------: -------- : --------- |
| ab124 | 2021/06-01 | 20210603| 10 |
| ab124 | 2021/05-01 | 20210502| 20 |
| ab124 | 2021/06-01 | 20210606| 30 |
| ab123 | 2021/06-01 | 20210606| 30 |
expected result:
week
year
sales
AvgSales
01
2021
60
2.3
--> (60 1st week sum)
--> (26 weeks ie 6 months)
--> 60/26 = 2.3
Actual i need to get like: Total sales/total no of weeks (only ab124 in where condition)
but am not able to get correct avg. can anyone please suggest where am going wrong.

What's the most efficient way to calculate a rolling aggregate in SQL?

I have a dataset that includes a bunch of clients and date ranges that they had a "stay." For example:
| ClientID | DateStart | DateEnd |
+----------+-----------+---------+
| 1 | Jan 1 | Jan 31 | (datediff = 30)
| 1 | Apr 4 | May 4 | (datediff = 30)
| 2 | Jan 3 | Feb 27 | (datediff = 55)
| 3 | Jan 1 | Jan 7 | (datediff = 6)
| 3 | Jan 10 | Jan 17 | (datediff = 6)
| 3 | Jan 20 | Jan 27 | (datediff = 6)
| 3 | Feb 1 | Feb 7 | (datediff = 6)
| 3 | Feb 10 | Feb 17 | (datediff = 6)
| 3 | Feb 20 | Feb 27 | (datediff = 6)
My ultimate goal is to be able to identify the dates on which a client passed a threshold of N nights in the past X time. Let's say 30 days in the last 90 days. I also need to know when they pass out of the threshold. Use case: hotel stays and a VIP status.
In the example above, Client 1 passed the threshold on Jan 31 (had 30 nights in past 90 days), and still kept meeting the threshold until April 2 (now only 29 nights in the past 90 days), but passed the threshold again on May 4.
Client 2 passed the threshold on Feb 3, and kept meeting the threshold until April 28th, at which point the earliest days are more than 90 days ago and they expire.
Client 3 passed the threshold on around Feb 17
So I would like to generate a table like this:
| ClientID | VIPStart | VIPEnd |
+----------+-----------+---------+
| 1 | Jan 31 | Apr 2 |
| 1 | May 4 | Jul 5 |
| 2 | Feb 3 | Apr 28 |
| 3 | Feb 17 | Apr 11 |
(Forgive me if the dates are slightly off, I'm doing this in my head)
Ideally I would like to generate a view, as I will need to reference it often.
What I want to know is what's the most efficient way to generate this? Assuming I have thousands of clients and hundreds of thousands of stays.
The way that I've been approaching this so far has been to use a SQL statement that includes a parameter: as of {?Date}, who had VIP status and who didn't. I do that by calculating DATEADD(day,-90,{?Date}), then excluding the records that are out of the range, then truncating the DateStarts that extend earlier and DateEnds that extend later, then calculating the DATEDIFF(day,DateStart,DateEnd) for the resulting stays using adjusted DateStart and DateEnd, then getting a SUM() of the resulting DATEDIFF() for each Client as of {?Date}. It works, but it's not pretty. And it gives me a point in time snapshot; I want the history.
it seems a little inefficient to generate a table of dates and then for every single date, use the above method.
Another option I considered was converting the raw data into an exploded table with each record corresponding to one night, then I can count it easier. Like this:
| ClientID | StayDate |
+----------+-----------+
| 1 | Jan 1 |
| 1 | Jan 2 |
| 1 | Jan 3 |
| 1 | Jan 4 |
etc.
Then I could just add a column counting the number of days in the past 90 days, and that'll get me most of the way there.
But I'm not sure how to do that in a view. I have a code snippet that does this:
WITH DaysTally AS (
SELECT MAX(DATEDIFF(day, DateStart, DateEnd)) - 1 AS Tally
FROM Stays
UNION ALL
SELECT Tally - 1 AS Expr1
FROM DaysTally AS DaysTally_1
WHERE (Tally - 1 >= 0))
SELECT t.ClientID,
DATEADD(day, c.Tally, t.DateStart) AS "StayDate"
FROM Stays AS t
INNER JOIN DaysTally AS c ON
DATEDIFF(day, t.DateStart, t.DateEnd) - 1 >= c.Tally
OPTION (MAXRECURSION 0)
But I can't get it to work without the MAXRECURSION and I don't think you can save a view with MAXRECURSION
And now I'm rambling. So the help that I'm looking for is: what is the most efficient method to pursue my goal? And if you have a code example, that would be helpful too! Thanks.
This is an interesting and pretty well-asked question. I would start by enumerating the days from the beginning of the first stay of each client until 90 days after the end of its last stay with a recursive cte. You can then bring the stay table with a left join, and use window functions to flag the "VIP" days (note that this assumes no overlaping stays for a given client, which is consistent with your sample data).
What follows is gaps-and-islands: you can use a window sum to put "adjacent" VIP days in groups, and then aggregate.
with cte as (
select clientID, min(dateStart) dt, dateadd(day, 90, max(dateEnd)) dateMax
from stays
group by clientID
union all
select clientID, dateadd(day, 1, dt), dateMax
from cte
where dt < dateMax
)
select clientID, min(dt) VIPStart, max(dt) VIPEnd
from (
select t.*, sum(isNotVip) over(partition by clientID order by dt) grp
from (
select
c.clientID,
c.dt,
case when count(s.clientID) over(
partition by c.clientID
order by c.dt
rows between 90 preceding and current row
) >= 30
then 0
else 1
end isNotVip
from cte c
left join stays s
on c.clientID = s.clientID and c.dt between s.dateStart and s.dateEnd
) t
) t
where isNotVip = 0
group by clientID, grp
order by clientID, VIPStart
option (maxrecursion 0)
This demo on DB Fiddle with your sample data produces:
clientID | VIPStart | VIPEnd
-------: | :--------- | :---------
1 | 2020-01-30 | 2020-04-01
1 | 2020-05-03 | 2020-07-04
2 | 2020-02-01 | 2020-04-28
3 | 2020-02-07 | 2020-04-20
You can put this in a view as follows:
the order by and option(maxrecursion) clauses must be omitted when creating the view
each and every query that has the view in its from clause must end with option(max recursion 0)
Demo
You can eliminate the recursion by creating a tally table in the view. The approach is then the following:
For each period, generate dates from 90 days before the period to 90 days after. These are all the "candidate days" that the period could affect.
For each row, add a flag as to whether it is in the period (as opposed to the 90 days before and after).
Aggregate by client id and date.
Use a running sum to get the days with 30+ in the previous 90 days.
Then filter for the ones with 30+ days and treat this as a gaps-and-islands problem.
Assuming that 1000 days is sufficient for the periods (including the 90 days before and after), then the query looks like this:
with n as (
select v.n
from (values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v(n)
),
nums as (
select (n1.n * 100 + n2.n * 10 + n3.n) as n
from n n1 cross join n n2 cross join n n3
),
running90 as (
select clientid, dte, sum(in_period) over (partition by clientid order by dte rows between 89 preceding and current row) as running_90
from (select t.clientid, dateadd(day, n.n - 90, datestart) as dte,
max(case when dateadd(day, n.n - 90, datestart) >= datestart and dateadd(day, n.n - 90, datestart) <= t.dateend then 1 else 0 end) as in_period
from t join
nums n
on dateadd(day, n.n - 90, datestart) <= dateadd(day, 90, dateend)
group by t.clientid, dateadd(day, n.n - 90, datestart)
) t
)
select clientid, min(dte), max(dte)
from (select r.*,
row_number() over (partition by clientid order by dte) as seqnum
from running90 r
where running_90 >= 30
) r
group by clientid, dateadd(day, - seqnum, dte);
Having no recursive CTE (although one could be used for n), this is not subject to the maxrecursion issue.
Here is a db<>fiddle.
The results are slightly different from your results. This is probably due to some slight difference in the definitions. The above includes the end day as an "occupied" day. The 90 days is 89 days before plus the current day in the above query. The second-to-last query shows the 90 days running days, and that seems correct to me.

SQL get the time of different rows

I want to do a select that gives me the time of an employee resolving a ticket.
The problem is that the ticket is divided in actions, so its not only getting the time of a row, it can be from n rows.
This is an abbreviation of what I have:
Tickets
TicketID | Days | Hours | Minutes
------------------------------------------------
12 | 0 | 2 | 32
12 | 1 | 0 | 12
12 | 4 | 6 | 0
13 | 2 | 5 | 12
13 | 0 | 2 | 33
And this is what I want to get:
TicketID | Time (in minutes)
------------------------------------------------
12 | 2994
13 | 1425
(Or just one row with the condition where specifying TicketID)
This is the select that im doing right now:
select distinct ((Days*8)*60) + (Hours*60) + Minutes from Tickets where ticketid = 12
But is not working as I want.
select ticketid, sum((Days*8)*60), sum((Hours*60)), sum (Minutes)
from tickets
group by ticketid
select TicketID, sum((Days*8)*60) + sum(Hours*60) + sum(Minutes) as Time_in_minutes
from Tickets
group by TicketID
Distinct, as you were trying before, takes each row in the source table (Tickets) and filters out all of the duplicate rows. Instead, you are trying to sum up the days, minutes, and hours for each ticket. So sum them up, and group by the ticket number.
Try this:
SELECT TicketID, (Sum(Minutes)+(Sum(Hours)*60)+(sum(Days)*24*60) ) time
FROM Tickets Group by TicketID

Oracle SQL: How to eliminate redundant recursive calls in CTE

The below set represents the sales of a product in consecutive weeks.
22,19,20,23,16,14,15,15,18,21,24,10,17
...
weekly sales table
date sales
week-1 : 22
week-2 : 19
week-3 : 20
...
week-12 : 10
week-13 : 17
I need to find the longest run of higher sales figures for consecutive weeks, i.e week-6 to week-11 represented by 14,15,15,18,21,24.
I am trying to use a recursive CTE to move forward to the next week(s) to find if the sales value is equal or higher. As long as the value is equal or higher, keep on moving to the next week, recording the ROWNUMBER of the anchor member (represents the starting week number) and the week number of the iterated row. With this approach, there are redundant recursive calls. For example, when cte is called for week-2, it iterates week-3, week-4 and week-5 as the sales values are higher on each week from its previous week. Now, after week-2, the cte should be called for week-5 as week-3, week-4 and week-5 have already been visited.
Basically, if I have already visited a row of filt_coll in my recursive calls, I do not want it to be passed to the CTE again. The rows marked as redundant should not be found and the values for actualweek column should be unique.
I know the sql below does not give a solution to my problem of finding the longest run of higher values. I can work out that from the max count of startweek column. For now, I am trying to figure out how to eliminate the redundant recursive calls.
START_WEEK | SALES | SALESLAG | SALESLEAD | ACTUALWEEK
1 | 22 | 0 | -3 | 1
2 | 19 | -3 | 1 | 2
2 | 20 | 1 | 3 | 3
2 | 23 | 3 | -7 | 4
3 | 20 | 1 | 3 | 3 <-(redundant)
3 | 23 | 3 | -7 | 4 <-(redundant)
4 | 23 | 3 | -7 | 4 <-(redundant)
6 | 14 | -2 | 1 | 6
...
with
-- begin test data
raw_data (sales) as
(
select '22,19,20,23,16,14,15,15,18,21,24,10,17' from dual
)
,
derived_tbl(week, sales) as
(
select level, regexp_substr(sales, '([[:digit:]]+)(,|$)', 1, level, null, 1)
from raw_data connect by level <= regexp_count(sales,',')+1
)
-- end test data
,
coll(week, sales, saleslag, saleslead) as
(
select week, sales,
nvl(sales - (lag(sales) over (order by week)), 0),
nvl((lead(sales) over (order by week) - sales), 0)
from derived_tbl
)
,
filt_coll(week, sales, saleslag, saleslead) as
(
select week, sales, saleslag, saleslead
from coll
where not (saleslag < 0 and saleslead < 0)
)
,
cte(startweek, sales, saleslag, saleslead, actualweek) as
(
select week, sales, saleslag, saleslead, week from filt_coll
-- where week not in (select week from cte)
-- *** want to achieve the effect of the above commented out line
union all
select cte.startweek, cl.sales, cl.saleslag, cl.saleslead, cl.week
from filt_coll cl, cte
where cl.week = cte.actualweek + 1 and cl.sales >= cte.sales
)
select * from cte
order by 1,actualweek
;

SQL Query for 7 Day Rolling Average in SQL Server

I have a table of hourly product usage (how many times the product is used) data –
ID (bigint)| ProductId (tinyint)| Date (int - YYYYMMDD) | Hour (tinyint)| UsageCount (int)
#|1 | 20140901 | 0 | 10
#|1 | 20140901 | 1 | 15
#|1 | 20140902 | 5 | 25
#|1 | 20140903 | 5 | 25
#|1 | 20140904 | 3 | 25
#|1 | 20140905 | 7 | 25
#|1 | 20140906 | 10 | 25
#|1 | 20140907 | 9 | 25
#|1 | 20140908 | 5 | 25
#|2 | 20140903 | 16 | 10
#|2 | 20140903 | 13 | 115
Likewise, I have the usage data for 4 different products (ProductId from 1 through 4) stored for every hour in the product_usage table. As you can imagine, it is constantly growing as the nightly ETL process dumps the data for the entire previous day. If a product is not used on any hour of a day, the record for that hour won’t appear in this table. Similarly, if a product is not used for the entire day, there won’t be any record for that day in the table. I need to generate a report that gives daily usage and last 7 days’ rolling average –
For example:
ProductId | Date | DailyUsage | RollingAverage
1 | 20140901 | sum of usages of that day | (Sum of usages from 20140901 through 20140826) / 7
1 | 20140901 | sum of usages of that day | (Sum of usages from 20140901 through 20140826) / 7
1 | 20140902 | sum of usages of that day | (Sum of usages from 20140902 through 20140827) / 7
2 | 20140902 | sum of usages of that day | (Sum of usages from 20140902 through 20140827) / 7
And so on..
I am planning to create an Indexed View in SQL server 2014. Can you think of an efficient SQL query to do this?
Try:
select x.*,
avg(dailyusage) over(partition by productid order by productid, date rows between 6 preceding and current row) as rolling_avg
from (select productid, date, sum(usagecount) as dailyusage
from tbl
group by productid, date) x
Fiddle:
http://sqlfiddle.com/#!6/f674a7/4/0
Replace "avg(dailusage) over...." with sum (rather than avg) if what you really want is the sum for the past week. In your title you say you want the average but later you say you want the sum. The query should be the same other than that, so use whichever you actually want.
As was pointed out by Gordon this is basically the average of the past 6 dates in which the product was used, which might be more than just the past 6 days if there are days without any rows for that product on the table because it wasn't used at all. To get around that you could use a date table and your products table.
You have to be careful if you can be missing data on some days. If I assume that there is data for some product on each day, then this approach will work:
select p.productid, d.date, sum(usagecount),
sum(sum(usagecount)) over (partition by p.productid order by d.date
rows between 6 preceding and current row) as Sum7day
from (select distinct productid from hourly) p cross join
(select distinct date from hourly) d left join
hourly h
on h.productid = p.productid and h.date = p.date
group by p.productid, d.date;