sql repeat rows for weekend and holidays - sql

I have a table A that we import based on the day that it lands on a location. We dont receive files on weekend and public holidays, and the table has multiple countries data so the public holidays vary. In essence we looking to duplicate a row multiple times till it encounters the next record for that ID (unless its the max date for that ID). A typical record looks like this:
Account Datekey Balance
1 20181012 100
1 20181112 100
1 20181212 100
1 20181512 100
1 20181712 100
And needs to look like this (sat, sun & PH added to indicate the day of week):
Account Datekey Balance
1 20181012 100
1 20181112 100
1 20181212 100
1 20181312 100 Sat
1 20181412 100 Sun
1 20181512 100
1 20181612 100 PH
1 20181712 100
Also Datekey is numeric and not a date. I tried a couple solutions suggested but found that it simply duplicates the previous row multiple times without stopping when the next dates record is found. I need to run it as an update query that would execute daily on table A and add missing records when its executed (sometimes 2 or 3 days later).
Hope you can assist.
Thanks

This question has multiple parts:
Converting an obscene date format to a date
Generating "in-between" rows
Filling in the new rows with the previous value
Determining the day of the week
The following does most of this. I refuse to regenerate the datekey format. You really need to fix that.
This also assumes that your setting are for English week day names.
with t as (
select Account, Datekey, Balance, convert(date, left(dkey, 4) + right(dkey, 2) + substring(dkey, 5, 2)) as proper_date
from yourtable
),
dates as (
select account, min(proper_date) as dte, max(proper_date) as max_dte
from t
group by account
union all
select account, dateadd(day, 1, dte), max_dte
from dates
where dte < max_dte
)
select d.account, d.dte, t.balance,
(case when datename(weekday, d.dte) in ('Saturday', 'Sunday')
then left(datename(weekday, d.dte), 3)
else 'PH'
end) as indicator
from dates d cross apply
(select top (1) t.*
from t
where t.account = d.account and
t.proper_date <= d.dte
order by t.proper_date desc
) t
option (maxrecursion 0);

Related

SQL - Get historic count of rows collected within a certain period by date

For many years I've been collecting data and I'm interested in knowing the historic counts of IDs that appeared in the last 30 days. The source looks like this
id
dates
1
2002-01-01
2
2002-01-01
3
2002-01-01
...
...
3
2023-01-10
If I wanted to know the historic count of ids that appeared in the last 30 days I would do something like this
with total_counter as (
select id, count(id) counts
from source
group by id
),
unique_obs as (
select id
from source
where dates >= DATEADD(Day ,-30, current_date)
group by id
)
select count(distinct(id))
from unique_obs
left join total_counter
on total_counter.id = unique_obs.id;
The problem is that this results would return a single result for today's count as provided by current_date.
I would like to see a table with such counts as if for example I had ran this analysis yesterday, and the day before and so on. So the expected result would be something like
counts
date
1235
2023-01-10
1234
2023-01-09
1265
2023-01-08
...
...
7383
2022-12-11
so for example, let's say that if the current_date was 2023-01-10, my query would've returned 1235.
If you need a distinct count of Ids from the 30 days up to and including each date the below should work
WITH CTE_DATES
AS
(
--Create a list of anchor dates
SELECT DISTINCT
dates
FROM source
)
SELECT COUNT(DISTINCT s.id) AS "counts"
,D.dates AS "date"
FROM CTE_DATES D
LEFT JOIN source S ON S.dates BETWEEN DATEADD(DAY,-29,D.dates) AND D.dates --30 DAYS INCLUSIVE
GROUP BY D.dates
ORDER BY D.dates DESC
;
If the distinct count didnt matter you could likely simplify with a rolling sum, only hitting the source table once:
SELECT S.dates AS "date"
,COUNT(1) AS "count_daily"
,SUM("count_daily") OVER(ORDER BY S.dates DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS "count_rolling" --assumes there is at least one row for every day.
FROM source S
GROUP BY S.dates
ORDER BY S.dates DESC;
;
This wont work though if you have gaps in your list of dates as it'll just include the latest 30 days available. In which case the first example without distinct in the count will do the trick.
SELECT count(*) AS Counts
dates AS Date
FROM source
WHERE dates >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY dates
ORDER BY dates DESC

Finding Active Clients By Date

I'm having trouble writing a recursive function that would count the number of active clients on any given day.
Say I have a table like this:
Client
Start Date
End Date
1
1-Jan-22
2
1-Jan-22
3-Jan-22
3
3-Jan-22
4
4-Jan-22
5-Jan-22
5
4-Jan-22
6-Jan-22
6
7-Jan-22
9-Jan-22
I want to return a table that would look like this:
Date
NumActive
1-Jan-22
2
2-Jan-22
2
3-Jan-22
3
4-Jan-22
4
5-Jan-22
4
6-Jan-22
3
7-Jan-22
3
8-Jan-22
3
9-Jan-22
4
Is there a way to do this? Ideally, I'd have a fixed start date and go to today's date.
Some pieces I have tried:
Creating a recursive date table
Truncated to Feb 1, 2022 for simplicity:
WITH DateDiffs AS (
SELECT DATEDIFF(DAY, '2022-02-02', GETDATE()) AS NumDays
)
, Numbers(Numbers) AS (
SELECT MAX(NumDays) FROM DateDiffs
UNION ALL
SELECT Numbers-1 FROM Numbers WHERE Numbers > 0
)
, Dates AS (
SELECT
Numbers
, DATEADD(DAY, -Numbers, CAST(GETDATE() -1 AS DATE)) AS [Date]
FROM Numbers
)
I would like to be able to loop over the dates in that table, such as by modifying the query below for each date, such as by #loopdate. Then UNION ALL it to a larger final query.
I'm now stuck as to how I can run the query to count the number of active users:
SELECT
COUNT(Client)
FROM clients
WHERE [Start Date] >= #loopdate AND ([End Date] <= #loopdate OR [End Date] IS NULL)
Thank you!
You don't need anything recursive in this particular case, you need as a minimum a list of dates in the range you want to report on, ideally a permanent calendar table.
for purposes of demonstration you can create something on the fly, and use it like so, with the list of dates something you outer join to:
with dates as (
select top(9)
Convert(date,DateAdd(day, -1 + Row_Number() over(order by (select null)), '20220101')) dt
from master.dbo.spt_values
)
select d.dt [Date], c.NumActive
from dates d
outer apply (
select Count(*) NumActive
from t
where d.dt >= t.StartDate and (d.dt <= t.EndDate or t.EndDate is null)
)c
See this Demo Fiddle

How to setup a cumulative count grouped by month with underlying conditions

I've come across somewhat of an interesting scenario where I'm needing to aggregate enrollment counts and group them by the individual month and all subsequent months leading up to the completion date. The starting counter will be placed into the month when the enrollment began, and now I'm needing to set up a cumulative sum to carry out the single count.
Here's a couple of test records I'm working with
I've set up the following query to compile the date_month CTE to compile the full 12 months derived from my Start/End Range variables. I've then joined it to my test table in order to establish the Counter placements.
DECLARE #EnrollmentDateStart DATETIME = '2020-01-01'
DECLARE #EnrollmentDateEnd DATETIME = '2020-12-01'
;WITH CTE_Months(year_month) AS
(
SELECT DATEADD(MONTH, n, DATEADD(MONTH, DATEDIFF(MONTH, 0, #EnrollmentDateStart), 0))
FROM ( SELECT TOP (DATEDIFF(MONTH, #EnrollmentDateStart, #EnrollmentDateEnd) + 1)
n = ROW_NUMBER() OVER (ORDER BY [object_id]) - 1
FROM sys.all_objects ORDER BY [object_id] ) AS n
)
SELECT
[Year] = YEAR(cm.year_month),
[Month] = DATENAME(MONTH, cm.year_month),
SUM(IIF(tt.[Enrollment Start Date] >= #EnrollmentDateStart,1,0)) AS EnrollmentCount
FROM CTE_Months cm
LEFT OUTER JOIN #TMP_Testing_Table tt
ON tt.[Enrollment Start Date] >= cm.year_month
AND tt.[Enrollment Start Date] < DATEADD(MONTH, 1, cm.year_month)
GROUP BY tt.Department, cm.year_month
At this stage, I'm pulling back the following results, so I now have the Enrollment Counts placed into the correct starting months derived from the Enrollment Start Date.
Now I'm trying to figure out what would be the best course of action to place the subsequent count for the additional months leading up to the Completion date?
For example - The first User (UserId: 1) was Enrolled in March, 2020, and Completed in August, 2020, so essentially I'm looking to produce the following result to reflect the number of months ranging between March <> July (Last month prior to Completion)
January: 0
February: 0
March: 1
April: 1
May: 1
June: 1
July: 1
August: 0
September: 0
October: 0
November: 0
December: 0
Thinking a cumulative total should be able to address the subsequent for the month by month range, however, I would then need to zero out the total for all subsequent months on and after the recorded Completion date for this record in question.
Seeing if I can get your thoughts/suggestions on how to address this scenario? Apologies if the information/explanation is confusing, but please let me know, and I'll do my best to elaborate.
....................
SELECT
[Year] = YEAR(cm.year_month),
[Month] = DATENAME(MONTH, cm.year_month),
count(tt.userid) AS EnrollmentCount
FROM CTE_Months cm
LEFT OUTER JOIN #TMP_Testing_Table tt on cm.year_month > eomonth([Enrollment Start Date], -1)
and cm.year_month <= tt.[Enrollment End Date]
GROUP BY cm.year_month

Pending Monthly SQL Counts

The below query returns accurate info, I just haven't had any luck trying to make this:
1) More dynamic so I'm not repeating the same line of code every month
2) Formatted differently, so just 2 columns of month + year are needed to view pending counts by field1 + field2
Example code (basically, sum when (OPEN date is before/on the last day of the month) and (CLOSE date comes after the month OR it's still opened)
SELECT
SUM(CAST(case when OPENDATE <= '2014-11-30 23:59:59'
and ((CLOSED >= '2014-12-01')
or (CLOSED is null)) then '1' else '0' end as int)) Nov14
,SUM(CAST(case when OPENDATE <= '2014-12-31 23:59:59'
and ((CLOSED >= '2015-01-01')
or (CLOSED is null)) then '1' else '0' end as int)) Dec14
,SUM(CAST(case when OPENDATE <= '2015-01-30 23:59:59'
and ((CLOSED >= '2015-02-01')
or (CLOSED is null)) then '1' else '0' end as int)) Jan15
,FIELD1,FIELD2
FROM T
GROUP BY FIELD1,FIELD2
Results:
FIELD1 FIELD2 NOV14 DEC14 JAN15
A A 2 5 7
A B 6 8 4
C A 5 6 5
…
Instead of:
COUNT FIELD1 FIELD2 MO YR
14 A A 12 2014
18 A B 12 2014
16 C A 1 2015
...
Is there a way to get this in one shot? Sorry if this is a repeat topic, I've looked at some boards and they've helped me get closing counts.. but using a range between two date fields, I haven't had any luck.
Thanks in advance
One way to do it is to use a table of numbers or calendar table.
In the code below the table Numbers has a column Number, which contains integer numbers starting from 1. There are many ways to generate such table.
You can do it on the fly, or have the actual table. I personally have such table in the database with 100,000 rows.
The first CROSS APPLY effectively creates a column CurrentMonth, so that I don't have to repeat the call to DATEADD many times later.
Second CROSS APPLY is your query that you want to run for each month. It can be as complicated as needed, it can return more than one row if needed.
-- Start and end dates should be the first day of the month
DECLARE #StartDate date = '20141201';
DECLARE #EndDate date = '20150201';
SELECT
CurrentMonth
,FIELD1
,FIELD2
,Counts
FROM
Numbers
CROSS APPLY
(
SELECT DATEADD(month, Numbers.Number-1, #StartDate) AS CurrentMonth
) AS CA_Month
CROSS APPLY
(
SELECT
FIELD1
,FIELD2
,COUNT(*) AS Counts
FROM T
WHERE
OPENDATE < CurrentMonth
AND (CLOSED >= CurrentMonth OR CLOSED IS NULL)
GROUP BY
FIELD1
,FIELD2
) AS CA
WHERE
Numbers.Number < DATEDIFF(month, #StartDate, #EndDate) + 1
;
If you provide a table with sample data and expected output, I could verify that the query produces correct results.
The solution is written in SQL Server 2008.
Like this:
SELECT
FIELD1,FIELD2,datepart(month, OPENDATE), datepart(year, OPENDATE), sum(1)
FROM T
GROUP BY FIELD1,FIELD2, datepart(month, OPENDATE), datepart(year, OPENDATE)
But this of course is just based on OPENDATE, if you need to have the same thing calculated into several months, that's going to be more difficult, and you'll probably need a calendar "table" that you'll have to cross apply with this data.

Count occurrences of combinations of columns

I have daily time series (actually business days) for different companies and I work with PostgreSQL. There is also an indicator variable (called flag) taking the value 0 most of the time, and 1 on some rare event days. If the indicator variable takes the value 1 for a company, I want to further investigate the entries from two days before to one day after that event for the corresponding company. Let me refer to that as [-2,1] window with the event day being day 0.
I am using the following query
CREATE TABLE test AS
WITH cte AS (
SELECT *
, MAX(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN 1 preceding AND 2 following) Lead1
FROM mytable)
SELECT *
FROM cte
WHERE Lead1 = 1
ORDER BY day,company
The query takes the entries ranging from 2 days before the event to one day after the event, for the company experiencing the event.
The query does that for all events.
This is a small section of the resulting table.
day company flag
2012-01-23 A 0
2012-01-24 A 0
2012-01-25 A 1
2012-01-25 B 0
2012-01-26 A 0
2012-01-26 B 0
2012-01-27 B 1
2012-01-30 B 0
2013-01-10 A 0
2013-01-11 A 0
2013-01-14 A 1
Now I want to do further calculations for every [-2,1] window separately. So I need a variable that allows me to identify each [-2,1] window. The idea is that I count the number of windows for every company with the variable "occur", so that in further calculations I can use the clause
GROUP BY company, occur
Therefore my desired output looks like that:
day company flag occur
2012-01-23 A 0 1
2012-01-24 A 0 1
2012-01-25 A 1 1
2012-01-25 B 0 1
2012-01-26 A 0 1
2012-01-26 B 0 1
2012-01-27 B 1 1
2012-01-30 B 0 1
2013-01-10 A 0 2
2013-01-11 A 0 2
2013-01-14 A 1 2
In the example, the company B only occurs once (occur = 1). But the company A occurs two times. For the first time from 2012-01-23 to 2012-01-26. And for the second time from 2013-01-10 to 2013-01-14. The second time range of company A does not consist of all four days surrounding the event day (-2,-1,0,1) since the company leaves the dataset before the end of that time range.
As I said I am working with business days. I don't care for holidays, I have data from monday to friday. Earlier I wrote the following function:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$BODY$
WITH alldates AS (
SELECT i,
$1 + (i * CASE WHEN $2 < 0 THEN -1 ELSE 1 END) AS date
FROM generate_series(0,(ABS($2) + 5)*2) i
),
days AS (
SELECT i, date, EXTRACT('dow' FROM date) AS dow
FROM alldates
),
businessdays AS (
SELECT i, date, d.dow FROM days d
WHERE d.dow BETWEEN 1 AND 5
ORDER BY i
)
-- adding business days to a date --
SELECT date FROM businessdays WHERE
CASE WHEN $2 > 0 THEN date >=$1 WHEN $2 < 0
THEN date <=$1 ELSE date =$1 END
LIMIT 1
offset ABS($2)
$BODY$
LANGUAGE 'sql' VOLATILE;
It can add/substract business days from a given date and works like that:
select * from addbusinessdays('2013-01-14',-2)
delivers the result 2013-01-10. So in Jakub's approach we can change the second and third last line to
w.day BETWEEN addbusinessdays(t1.day, -2) AND addbusinessdays(t1.day, 1)
and can deal with the business days.
Function
While using the function addbusinessdays(), consider this instead:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$func$
SELECT day
FROM (
SELECT i, $1 + i * sign($2)::int AS day
FROM generate_series(0, ((abs($2) * 7) / 5) + 3) i
) sub
WHERE EXTRACT(ISODOW FROM day) < 6 -- truncate weekend
ORDER BY i
OFFSET abs($2)
LIMIT 1
$func$ LANGUAGE sql IMMUTABLE;
Major points
Never quote the language name sql. It's an identifier, not a string.
Why was the function VOLATILE? Make it IMMUTABLE for better performance in repeated use and more options (like using it in a functional index).
(ABS($2) + 5)*2) is way too much padding. Replace with ((abs($2) * 7) / 5) + 3).
Multiple levels of CTEs were useless cruft.
ORDER BY in last CTE was useless, too.
As mentioned in my previous answer, extract(ISODOW FROM ...) is more convenient to truncate weekends.
Query
That said, I wouldn't use above function for this query at all. Build a complete grid of relevant days once instead of calculating the range of days for every single row.
Based on this assertion in a comment (should be in the question, really!):
two subsequent windows of the same firm can never overlap.
WITH range AS ( -- only with flag
SELECT company
, min(day) - 2 AS r_start
, max(day) + 1 AS r_stop
FROM tbl t
WHERE flag <> 0
GROUP BY 1
)
, grid AS (
SELECT company, day::date
FROM range r
,generate_series(r.r_start, r.r_stop, interval '1d') d(day)
WHERE extract('ISODOW' FROM d.day) < 6
)
SELECT *, sum(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING
AND 2 following) AS window_nr
FROM (
SELECT t.*, max(t.flag) OVER(PARTITION BY g.company ORDER BY g.day
ROWS BETWEEN 1 preceding
AND 2 following) in_window
FROM grid g
LEFT JOIN tbl t USING (company, day)
) sub
WHERE in_window > 0 -- only rows in [-2,1] window
AND day IS NOT NULL -- exclude missing days in [-2,1] window
ORDER BY company, day;
How?
Build a grid of all business days: CTE grid.
To keep the grid to its smallest possible size, extract minimum and maximum (plus buffer) day per company: CTE range.
LEFT JOIN actual rows to it. Now the frames for ensuing window functions works with static numbers.
To get distinct numbers per flag and company (window_nr), just count flags from the start of the grid (taking buffers into account).
Only keep days inside your [-2,1] windows (in_window > 0).
Only keep days with actual rows in the table.
Voilá.
SQL Fiddle.
Basically the strategy is to first enumarate the flag days and then join others with them:
WITH windows AS(
SELECT t1.day
,t1.company
,rank() OVER (PARTITION BY company ORDER BY day) as rank
FROM table1 t1
WHERE flag =1)
SELECT t1.day
,t1.company
,t1.flag
,w.rank
FROM table1 AS t1
JOIN windows AS w
ON
t1.company = w.company
AND
w.day BETWEEN
t1.day - interval '2 day' AND t1.day + interval '1 day'
ORDER BY t1.day, t1.company;
Fiddle.
However there is a problem with work days as those can mean whatever (do holidays count?).