SQL Server : create summarization based on multiple dates - sql

I have the following table containing positions for workers dated back by 10 years:
worker_id
position_code
date_from
date_to
1
x1
2021-01-01
2100-12-31
1
x2
2020-12-01
2021-01-01
2
x3
2000-01-01
2100-12-31
I want to create a view, where I can see for each worker what their position for every month.
So for example:
year
month
worker_id
position_code
2020
12
1
x2
2020
12
2
x3
2021
1
1
x1
2021
1
2
x3
2021
2
1
x1
Ideally I'm only interested on the last 6 month to have better performance.
overall there is ~10000 workers, and the table itself around ~100000 lines.
for some workers there is only 1 position, but it can be multiple.
In theory position is only changing at the beginning of months, but would be better to watch for this as well, and in this case take the which is active at the end of the month.
(so for example: from jan 1-10 position is x1, from jan 10-to 31 x2, in this case x2 is the one I'm looking for)

WITH WORKERS(worker_id, position_code, date_from, date_to) AS
(
SELECT 1 , 'x1', '2021-01-01', '2100-12-31' UNION ALL
SELECT 1 , 'x2' , '2020-12-01', '2021-01-01' UNION ALL
SELECT 2 , 'x3' , '2000-01-01' , '2100-12-31'
),
MINI_MAX AS
(
SELECT MIN(DATE_FROM)AS STARTT_DATE,MAX(DATE_TO)AS END_DATE
FROM WORKERS
),
CALENDAR AS
(
SELECT CAST(STARTT_DATE AS DATE)DATE_D FROM MINI_MAX AS W
UNION ALL
SELECT DATEADD(MONTH,1,Z.DATE_D)
FROM CALENDAR AS Z
WHERE Z.DATE_D<=(SELECT END_DATE FROM MINI_MAX)
),
RESULT AS
(
SELECT YEAR(C.DATE_D)AS YEARR,MONTH(C.DATE_D)MONTHH,W.worker_id,W.position_code
FROM CALENDAR AS C
JOIN WORKERS AS W ON C.DATE_D BETWEEN W.date_from AND W.date_to
)
SELECT R.YEARR,R.MONTHH,R.worker_id,R.position_code
FROM RESULT AS R
OPTION(MAXRECURSION 0)
I would say that the most suitable way for this kind of queries is to use a permanent calendar table and perform JOIN directly to it

The hard part is generating the months. One method is a recursive CTE:
with cte as (
select worker_id, position_code, date_from as dte,
eomonth(case when date_to < eomonth(getdate()) then dateadd(day, -1, date_to) else getdate() end) as date_to
from t
union all
select worker_id, position_code,
dateadd(month, 1, datefromparts(year(dte), month(dte), 1)), date_to
from cte
where eomonth(dte) < eomonth(date_to)
)
select *
from cte
order by worker_id, dte desc
option (maxrecursion 0)
Note: You might get duplicates if a worker starts a position in the middle of a month.
Here is a db<>fiddle.

Related

Split date into month and year based on number of months passed in stored procedure into a temp table

I have a stored procedure, where takes number of numbers as a parameter. I do my query with where clause like this
select salesrepid, month(salesdate), year(salesdate), salespercentage
from SalesRecords
where salesdate >= DATEADD(month, -#NumberOfMonths, getdate())
So for example, if #NumberOFmonths passed = 3 and based on todays date,
It should bring, september 9, october 10 and november 11 in my resultset. My query brings it but request is I need to return null for those salesrep who doesnt have a value for a month,
for example:
salerepid month year salespercentage
232 9 2020 80%
232 10 2020 null
232 11 2020 90%
how can I achieve this ? Right now the query brings back only two records and does not bring october data as no value is there, but i want it to return october with null value.
If I follow you correctly, you can generate all start of months within the target interval, and cross join that with the table to generate all possible combinations. Then you can bring the table with a left join:
with all_dates as (
select datefromparts(year(getdate()), month(getdate()), 1) salesdate, 0 lvl
union all
select dateadd(month, - lvl - 1, salesdate), lvl + 1
from all_dates
where lvl < #NumberOfMonths
)
select r.salesrepid, d.salesdate , s.salespercentage
from all_dates d
cross join (select distinct salesrepid from salesrecords) r
left join salesrecord s
on s.salesrepid = r.salesrepid
and s.salesdate >= d.salesdate
and s.salesdate < dateadd(month, 1, d.salesdate )
Your original query and result imply that there is at most one record per sales rep and month, so this works under the same assumption. If that's not the case (which would somehow make more sense), you would need aggregation in the outer query.
Declare #numberofmonths int = 3;
with all_dates as (
select datefromparts(year(getdate()), month(getdate()), 1) dt, 0 lvl
union all
select dateadd(month, - lvl - 1, dt), lvl + 1
from all_dates
where lvl < 3
)
select * from all_dates
This gives me following result:
2020-11-01 0
2020-10-01 1
2020-08-01 2
2020-05-01 3
I want only:
2020-11-01 0
2020-10-01 1
2020-09-01 2

For each quarter between two dates, add rows quarter by quarter in SQL SERVER

I have a table, with types int, datetime, datetime:
id start date end date
-- ---------- ----------
1 2019-04-02 2020-09-17
2 2019-08-10 2020-08-10
Here is create/insert:
CREATE TABLE dbo.something
(
id int,
[start date] datetime,
[end date] datetime
);
INSERT dbo.something(id,[start date],[end date])
VALUES(1,'20190402','20200917'),(2,'20190810','20200810');
What is a SQL query that can produce these results:
id Year Quarter
-- ---- ----------
1 2019 2
1 2019 3
1 2019 4
1 2020 1
1 2020 2
1 2020 3
2 2019 3
2 2019 4
2 2020 1
2 2020 2
2 2020 3
Just use a recursive CTE. This version switches to counting quarters from year 0:
with cte as (
select id,
year(start_date) * 4 + datepart(quarter, start_date) - 1 as yyyyq,
year(end_date) * 4 + datepart(quarter, end_date) - 1 as end_yyyyq
from t
union all
select id, yyyyq + 1, end_yyyyq
from cte
where yyyyq < end_yyyyq
)
select id, yyyyq / 4 as year, (yyyyq % 4) + 1 as quarter
from cte;
Here is a db<>fiddle.
If you cannot make another reference table/etc, you can use DATEDIFF (and DATEPART) using quarters, and then some simple date arithmetic.
The logic below is simply to find, for each startdate, the first quarter and then the number of additional quarters to get to the maximum. Then do a SELECT where the additional quarters are added to the startdate, to get each quarter.
The hardest part of the query to understand imo is the WITH numberlist section - all this does is generate a series of integers between 0 and the maximum number of quarters difference. If you already have a numbers table, you can use that instead.
Key code part is below, and here's a full DB_Fiddle with some additional test data.
CREATE TABLE #yourtable (id int, startdate date, enddate date)
INSERT INTO #yourtable (id, startdate, enddate) VALUES
(1, '2019-04-02', '2020-09-17'),
(2, '2019-08-10', '2020-08-20')
; WITH number_list AS
-- list of ints from 0 to maximum number of quarters
(SELECT n
FROM (SELECT ones.n + 10*tens.n AS n
FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n)
) AS a
WHERE n <= (SELECT MAX(DATEDIFF(quarter,startdate,enddate)) FROM #yourtable)
)
SELECT id,
YEAR(DATEADD(quarter, number_list.n, startdate)) AS [Year],
DATEPART(quarter, DATEADD(quarter, number_list.n, startdate)) AS [Quarter]
FROM (SELECT id, startdate, DATEDIFF(quarter,startdate,enddate) AS num_additional_quarters FROM #yourtable) yt
CROSS JOIN number_list
WHERE number_list.n <= yt.num_additional_quarters
DROP TABLE #yourtable
First create a date dimension table which contains date, corresponding quarter and year. Then use below query to get the result. Tweak column and table name according to your schema.
with q_date as
(
select 1 as id, '2019-04-02' :: date as start_date, '2020-09-17' :: date as end_date
UNION ALL
select 2 as id, '2019-08-10' :: date as start_date, '2020-08-10' :: date as end_date
)
select qd.id, dd.calendar_year, dd.calendar_quarter_number
from dim_date dd, q_date qd
where dd.date_dmk between qd.start_date and qd.end_date
group by qd.id, dd.calendar_year, dd.calendar_quarter_number
order by qd.id, dd.calendar_year, dd.calendar_quarter_number;

Calculate standdard deviation over time

I have information about sales per day. For example:
Date - Product - Amount
01-07-2020 - A - 10
01-03-2020 - A - 20
01-02-2020 - B - 10
Now I would like to know the average sales per day and the standard deviation for the last year. For average I can just count the number of entries per item, and then count 365-amount of entries and take that many 0's, but I wonder what the best way is to calculate the standard deviation while incorporating the 0's for the days there are not sales.
Use a hierarchical (or recursive) query to generate daily dates for the year and then use a PARTITION OUTER JOIN to join it to your product data then you can find the average and standard deviation with the AVG and STDDEV aggregation functions and use COALESCE to fill in NULL values with zeroes:
WITH start_date ( dt ) AS (
SELECT DATE '2020-01-01' FROM DUAL
),
calendar ( dt ) AS (
SELECT dt + LEVEL - 1
FROM start_date
CONNECT BY dt + LEVEL - 1 < ADD_MONTHS( dt, 12 )
)
SELECT product,
AVG( COALESCE( amount, 0 ) ) AS average_sales_per_day,
STDDEV( COALESCE( amount, 0 ) ) AS stddev_sales_per_day
FROM calendar c
LEFT OUTER JOIN (
SELECT t.*
FROM test_data t
INNER JOIN start_date s
ON (
s.dt <= t."DATE"
AND t."DATE" < ADD_MONTHS( s.dt, 12 )
)
) t
PARTITION BY ( t.product )
ON ( c.dt = t."DATE" )
GROUP BY product
So, for your sample data:
CREATE TABLE test_data ( "DATE", Product, Amount ) AS
SELECT DATE '2020-07-01', 'A', 10 FROM DUAL UNION ALL
SELECT DATE '2020-03-01', 'A', 20 FROM DUAL UNION ALL
SELECT DATE '2020-02-01', 'B', 10 FROM DUAL;
This outputs:
PRODUCT | AVERAGE_SALES_PER_DAY | STDDEV_SALES_PER_DAY
:------ | ----------------------------------------: | ----------------------------------------:
A | .0819672131147540983606557377049180327869 | 1.16752986363678031669548047505759328696
B | .027322404371584699453551912568306010929 | .5227083734893166933219264686616717636897
db<>fiddle here

SQL query needed - Counting 365 days backwards

I have searched the forum many times but couldn't find a solution for my situation. I am working with an Oracle database.
I have a table with all Order Numbers and Customer Numbers by Day. It looks like this:
Day | Customer Nbr | Order Nbr
2018-01-05 | 25687459 | 256
2018-01-09 | 36478592 | 398
2018-03-07 | 25687459 | 1547
and so on....
Now I need a SQL Query which gives me a table by day and Customer Nbr and counts the number of unique Order Numbers within the last 365 days starting from column 1.
For the example above the resulting table should look like:
Day | Customer Nbr | Order Cnt
2019-01-01 | 25687459 | 2
2019-01-02 | 25687459 | 2
...
2019-03-01 | 25687459 | 1
One method is to generate values for all days of interest for each customer and then use a correlated subquery:
with dates as (
select date '2019-01-01' + rownum as dte from dual
connect by date '2019-01-01' + rownum < sysdate
)
select d.dte, t.customer_nbr,
(select count(*)
from t t2
where t2.customer_nbr = t.customer_nbr and
t2.day <= t.dte and
t2.date > t.dte - 365
) as order_cnt
from dates d cross join
(select distinct customer_nbr from t) ;
Edit:
I've just seen you clarify the question, which I've interpreted to mean:
For every day in the last year, show how many orders there were for each customer between that date, and 1 year previously. Working on an answer now...
Updated Answer:
For each customer, we count the number of records between the order day, and 365 days before it...
WITH yourTable AS
(
SELECT SYSDATE - 1 Day, 'Alex' CustomerNbr FROM DUAL
UNION ALL
SELECT SYSDATE - 2, 'Alex' FROM DUAL
UNION ALL
SELECT SYSDATE - 366, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 400, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 500, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 1, 'Joe'FROM DUAL
UNION ALL
SELECT SYSDATE - 300, 'Chris'FROM DUAL
UNION ALL
SELECT SYSDATE - 1, 'Chris'FROM DUAL
)
SELECT Day, CustomerNbr, OrdersLast365Days
FROM yourTable t
OUTER APPLY
(
SELECT COUNT(1) OrdersLast365Days
FROM yourTable t2
WHERE t.CustomerNbr = t2.CustomerNbr
AND TRUNC(t2.Day) >= TRUNC(t.Day) - 364
AND TRUNC(t2.Day) <= TRUNC(t.Day)
)
ORDER BY t.Day DESC, t.CustomerNbr;
If you want to report on just the days you have orders for, then a simple WHERE clause should be enough:
SELECT Day, CustomerNbr, COUNT(1) OrderCount
FROM <yourTable>
WHERE TRUNC(DAY) >= TRUNC(SYSDATE -364)
GROUP BY Day, CustomerNbr
ORDER BY Day Desc;
If you want to report on every day, you'll need to generate them first. This can be done by a recursive CTE, which you then join to your table:
WITH last365Days AS
(
SELECT TRUNC (SYSDATE - ROWNUM + 1) dt
FROM DUAL CONNECT BY ROWNUM < 365
)
SELECT d.Day, COALESCE(t.CustomerNbr, 'None') CustomerNbr, SUM(CASE WHEN t.CustomerNbr IS NULL THEN 0 ELSE 1 END) OrderCount
FROM last365Days d
LEFT OUTER JOIN <yourTable> t
ON d.Day = TRUNC(t.Day)
GROUP BY d.Day, t.CustomerNbr
ORDER BY d.Day Desc;
I would probably have done it with and analytic function. In your windowing clause, you can specify a number of rows before, or a range. In this case I will use a range.
This will give you, For Each customer for each day the number of orders during one rolling year before the date displayed
WITH DATES AS (
SELECT * FROM
(SELECT TRUNC(SYSDATE)-(LEVEL-1) AS DAY FROM DUAL CONNECT BY TRUNC(SYSDATE)-(LEVEL-1) >= ( SELECT MIN(TRUNC(DAY)) FROM MY_TABLE ))
CROSS JOIN
(SELECT DISTINCT CUST_ID FROM MY_TABLE))
SELECT DISTINCT
DATES.DAY,
DATES.CUST_ID,
COUNT(ORDER_ID) OVER (PARTITION BY DATES.CUST_ID ORDER BY DATES.DAY RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND INTERVAL '1' SECOND PRECEDING)
FROM
DATES
LEFT JOIN
MY_TABLE
ON DATES.DAY=TRUNC(MY_TABLE.DAY) AND DATES.CUST_ID=MY_TABLE.CUST_ID
ORDER BY DATES.CUST_ID,DATES.DAY;

Select min/max dates for periods that don't intersect

Example! I have a table with 4 columns. date format dd.MM.yy
id ban start end
1 1 01.01.15 31.12.18
1 1 02.02.15 31.12.18
1 1 05.04.15 31.12.17
In this case dates from rows 2 and 3 are included in dates from row 1
1 1 02.04.19 31.12.20
1 1 05.05.19 31.12.20
In this case dates from row 5 are included in dates from rows 4. Basically we have 2 periods that don't intersect.
01.01.15 31.12.18
and
02.04.19 31.12.20
Situation where a date starts in one period and ends in another are impossible. The end result should look like this
1 1 01.01.15 31.12.18
1 1 02.04.19 31.12.20
I tried using analitical functions(LAG)
select id
, ban
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(start) over (partition by id, ban order by start, end asc), start)
else start
end as start
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(end) over (partition by id, ban order by start, end asc), end)
else end
end as end
from table
Where I order rows and if current dates are included in previous I replace them. It works if I have just 2 rows. For example this
1 1 08.09.15 31.12.99
1 1 31.12.15 31.12.99
turns into this
1 1 08.09.15 31.12.99
1 1 08.09.15 31.12.99
which I can then group by all fields and get what I want, but if there are more
1 2 13.11.15 31.12.99
1 2 31.12.15 31.12.99
1 2 16.06.15 31.12.99
I get
1 2 16.06.15 31.12.99
1 2 16.06.15 31.12.99
1 2 13.11.15 31.12.99
I understand why this happens, but how do I work around it? Running the query multiple times is not an option.
This query looks promising:
-- test data
with t(id, ban, dtstart, dtend) as (
select 1, 1, date '2015-01-01', date '2015-03-31' from dual union all
select 1, 1, date '2015-02-02', date '2015-03-31' from dual union all
select 1, 1, date '2015-03-15', date '2015-03-31' from dual union all
select 1, 1, date '2015-08-05', date '2015-12-31' from dual union all
select 1, 2, date '2015-01-01', date '2016-12-31' from dual union all
select 2, 1, date '2016-01-01', date '2017-12-31' from dual),
-- end of test data
step1 as (select id, ban, dt, to_number(inout) direction
from t unpivot (dt for inout in (dtstart as '1', dtend as '-1'))),
step2 as (select distinct id, ban, dt, direction,
sum(direction) over (partition by id, ban order by dt) sm
from step1),
step3 as (select id, ban, direction, dt dt1,
lead(dt) over (partition by id, ban order by dt) dt2
from step2
where (direction = 1 and sm = 1) or (direction = -1 and sm = 0) )
select id, ban, dt1, dt2
from step3 where direction = 1 order by id, ban, dt1
step1 - unpivot dates and assign 1 for start date, -1 for end
date (column direction)
step2 - add cumulative sum for direction
step3 - filter only interesting dates, pivot second date using lead()
You can shorten this syntax, I divided it to steps to show what's going on.
Result:
ID BAN DT1 DT2
------ ---------- ----------- -----------
1 1 2015-01-01 2015-03-31
1 1 2015-08-05 2015-12-31
1 2 2015-01-01 2016-12-31
2 1 2016-01-01 2017-12-31
I assumed that for different (ID, BAN) we have to make calculations separately. If not - change partitioning and ordering in sum() and lead().
Pivot and unpivot works in Oracle 11 and later, for earlier versions you need case when.
BTW - START is reserved word in Oracle so in my example I changed slightly column names.
I like to do this by identifying the period starts, then doing a cumulative sum to define the group, and a final aggregation:
select id, ban, min(start), max(end)
from (select t.*, sum(start_flag) over (partition by id, bin order by start) as grp
from (select t.*,
(case when exists (select 1
from t t2
where t2.id = t.id and t2.ban = t.ban and
t.start <= t2.end and t.end >= t2.start and
t.start <> t2.start and t.end <> t2.end
)
then 0 else 1
end) as start_flag
from t
) t
) t
group by id, ban, grp;