Related
I have 3 tables:
INVENTORY_IN:
ID INV_TIMESTAMP PRODUCT_ID IN_QUANTITY SUPPLIER_ID
...
1 10.03.21 01:00:00 101 100 4
2 11.03.21 02:00:00 101 50 3
3 14.03.21 01:00:00 101 10 2
INVENTORY_OUT:
ID INV_TIMESTAMP PRODUCT_ID OUT_QUANTITY CUSTOMER_ID
...
1 10.03.21 02:00:00 101 30 1
2 11.03.21 01:00:00 101 40 2
3 12.03.21 01:00:00 101 80 1
INVENTORY_BALANCE:
INV_DATE PRODUCT_ID QUANTITY
...
09.03.21 101 20
10.03.21 101 90
11.03.21 101 100
12.03.21 101 20
13.03.21 101 20
14.03.21 101 30
I want to use FIFO (first in-first out) logic for the inventory, and to see which quantities correspond to each SUPPLIER-CUSTOMER combination.
The desired ouput looks like this (queried for dates >= 2021-03-10):
PRODUCT_ID SUPPLIER_ID CUSTOMER_ID QUANTITY
101 1 20
101 4 1 60
101 4 2 40
101 3 1 30
101 3 20
101 2 10
edit. fixed little typo in numbers.
edit. Added a diagram which explains every row. All of the black arrows correspond to supplier and customer combinations, there are 7 of them, because for supplier_id = 4 and customer_id = 1 the desired results is the sum of matched quantities happening between them. So, it explains why there are 7 arrows, while the desired results contains only 6 rows.
Option 1
This is probably a job for PL/SQL. Starting with the data types to output:
CREATE TYPE supply_details_obj AS OBJECT(
product_id NUMBER,
quantity NUMBER,
supplier_id NUMBER,
customer_id NUMBER
);
CREATE TYPE supply_details_tab AS TABLE OF supply_details_obj;
Then we can define a pipelined function to read the INVENTORY_IN and INVENTORY_OUT tables one row at a time and merge the two keeping a running total of the remaining inventory or amount to supply:
CREATE FUNCTION assign_suppliers_to_customers (
i_product_id IN INVENTORY_IN.PRODUCT_ID%TYPE
)
RETURN supply_details_tab PIPELINED
IS
v_supplier_id INVENTORY_IN.SUPPLIER_ID%TYPE;
v_customer_id INVENTORY_OUT.CUSTOMER_ID%TYPE;
v_quantity_in INVENTORY_IN.IN_QUANTITY%TYPE := NULL;
v_quantity_out INVENTORY_OUT.OUT_QUANTITY%TYPE := NULL;
v_cur_in SYS_REFCURSOR;
v_cur_out SYS_REFCURSOR;
BEGIN
OPEN v_cur_in FOR
SELECT in_quantity, supplier_id
FROM INVENTORY_IN
WHERE product_id = i_product_id
ORDER BY inv_timestamp;
OPEN v_cur_out FOR
SELECT out_quantity, customer_id
FROM INVENTORY_OUT
WHERE product_id = i_product_id
ORDER BY inv_timestamp;
LOOP
IF v_quantity_in IS NULL THEN
FETCH v_cur_in INTO v_quantity_in, v_supplier_id;
IF v_cur_in%NOTFOUND THEN
v_supplier_id := NULL;
END IF;
END IF;
IF v_quantity_out IS NULL THEN
FETCH v_cur_out INTO v_quantity_out, v_customer_id;
IF v_cur_out%NOTFOUND THEN
v_customer_id := NULL;
END IF;
END IF;
EXIT WHEN v_cur_in%NOTFOUND AND v_cur_out%NOTFOUND;
IF v_quantity_in > v_quantity_out THEN
PIPE ROW(
supply_details_obj(
i_product_id,
v_quantity_out,
v_supplier_id,
v_customer_id
)
);
v_quantity_in := v_quantity_in - v_quantity_out;
v_quantity_out := NULL;
ELSE
PIPE ROW(
supply_details_obj(
i_product_id,
v_quantity_in,
v_supplier_id,
v_customer_id
)
);
v_quantity_out := v_quantity_out - v_quantity_in;
v_quantity_in := NULL;
END IF;
END LOOP;
END;
/
Then, for the sample data:
CREATE TABLE INVENTORY_IN ( ID, INV_TIMESTAMP, PRODUCT_ID, IN_QUANTITY, SUPPLIER_ID ) AS
SELECT 0, TIMESTAMP '2021-03-09 00:00:00', 101, 20, 0 FROM DUAL UNION ALL
SELECT 1, TIMESTAMP '2021-03-10 01:00:00', 101, 100, 4 FROM DUAL UNION ALL
SELECT 2, TIMESTAMP '2021-03-11 02:00:00', 101, 50, 3 FROM DUAL UNION ALL
SELECT 3, TIMESTAMP '2021-03-14 01:00:00', 101, 10, 2 FROM DUAL;
CREATE TABLE INVENTORY_OUT ( ID, INV_TIMESTAMP, PRODUCT_ID, OUT_QUANTITY, CUSTOMER_ID ) AS
SELECT 1, TIMESTAMP '2021-03-10 02:00:00', 101, 30, 1 FROM DUAL UNION ALL
SELECT 2, TIMESTAMP '2021-03-11 01:00:00', 101, 40, 2 FROM DUAL UNION ALL
SELECT 3, TIMESTAMP '2021-03-12 01:00:00', 101, 80, 1 FROM DUAL;
The query:
SELECT product_id,
supplier_id,
customer_id,
SUM( quantity ) AS quantity
FROM TABLE( assign_suppliers_to_customers( 101 ) )
GROUP BY
product_id,
supplier_id,
customer_id
ORDER BY
MIN( inv_timestamp )
Outputs:
PRODUCT_ID | SUPPLIER_ID | CUSTOMER_ID | QUANTITY
---------: | ----------: | ----------: | -------:
101 | 0 | 1 | 20
101 | 4 | 1 | 60
101 | 4 | 2 | 40
101 | 3 | 1 | 30
101 | 3 | null | 20
101 | 2 | null | 10
Option 2
A (very) complicated SQL query:
WITH in_totals ( ID, INV_TIMESTAMP, PRODUCT_ID, IN_QUANTITY, SUPPLIER_ID, TOTAL_QUANTITY ) AS (
SELECT i.*,
SUM( in_quantity ) OVER ( PARTITION BY product_id ORDER BY inv_timestamp )
FROM inventory_in i
),
out_totals ( ID, INV_TIMESTAMP, PRODUCT_ID, OUT_QUANTITY, CUSTOMER_ID, TOTAL_QUANTITY ) AS (
SELECT o.*,
SUM( out_quantity ) OVER ( PARTITION BY product_id ORDER BY inv_timestamp )
FROM inventory_out o
),
split_totals ( product_id, inv_timestamp, supplier_id, customer_id, quantity ) AS (
SELECT i.product_id,
MIN( COALESCE( LEAST( i.inv_timestamp, o.inv_timestamp ), i.inv_timestamp ) )
AS inv_timestamp,
i.supplier_id,
o.customer_id,
SUM(
COALESCE(
LEAST(
i.total_quantity - o.total_quantity + o.out_quantity,
o.total_quantity - i.total_quantity + i.in_quantity,
i.in_quantity,
o.out_quantity
),
0
)
)
FROM in_totals i
LEFT OUTER JOIN
out_totals o
ON ( i.product_id = o.product_id
AND i.total_quantity - i.in_quantity <= o.total_quantity
AND i.total_quantity >= o.total_quantity - o.out_quantity )
GROUP BY
i.product_id,
i.supplier_id,
o.customer_id
ORDER BY
inv_timestamp
),
missing_totals ( product_id, inv_timestamp, supplier_id, customer_id, quantity ) AS (
SELECT i.product_id,
i.inv_timestamp,
i.supplier_id,
NULL,
i.in_quantity - COALESCE( s.quantity, 0 )
FROM inventory_in i
INNER JOIN (
SELECT product_id,
supplier_id,
SUM( quantity ) AS quantity
FROM split_totals
GROUP BY product_id, supplier_id
) s
ON ( i.product_id = s.product_id
AND i.supplier_id = s.supplier_id )
ORDER BY i.inv_timestamp
)
SELECT product_id, supplier_id, customer_id, quantity
FROM (
SELECT product_id, inv_timestamp, supplier_id, customer_id, quantity
FROM split_totals
WHERE quantity > 0
UNION ALL
SELECT product_id, inv_timestamp, supplier_id, customer_id, quantity
FROM missing_totals
WHERE quantity > 0
ORDER BY inv_timestamp
);
Which, for the sample data above, outputs:
PRODUCT_ID | SUPPLIER_ID | CUSTOMER_ID | QUANTITY
---------: | ----------: | ----------: | -------:
101 | 0 | 1 | 20
101 | 4 | 1 | 60
101 | 4 | 2 | 40
101 | 3 | 1 | 30
101 | 3 | null | 20
101 | 2 | null | 10
db<>fiddle here
If your system controls the timestamps so you cannot consume what was not supplied (I've met systems, that didn't track intraday balance), then you can use SQL solution with interval join. The only thing to take care here is to track the last supply that was not consumed in full: it should be added as supply with no customer.
Here's the query with comments:
CREATE TABLE INVENTORY_IN ( ID, INV_TIMESTAMP, PRODUCT_ID, IN_QUANTITY, SUPPLIER_ID ) AS
SELECT 0, TIMESTAMP '2021-03-09 00:00:00', 101, 20, 0 FROM DUAL UNION ALL
SELECT 1, TIMESTAMP '2021-03-10 01:00:00', 101, 100, 4 FROM DUAL UNION ALL
SELECT 2, TIMESTAMP '2021-03-11 02:00:00', 101, 50, 3 FROM DUAL UNION ALL
SELECT 3, TIMESTAMP '2021-03-14 01:00:00', 101, 10, 2 FROM DUAL;
CREATE TABLE INVENTORY_OUT ( ID, INV_TIMESTAMP, PRODUCT_ID, OUT_QUANTITY, CUSTOMER_ID ) AS
SELECT 1, TIMESTAMP '2021-03-10 02:00:00', 101, 30, 1 FROM DUAL UNION ALL
SELECT 2, TIMESTAMP '2021-03-11 01:00:00', 101, 40, 2 FROM DUAL UNION ALL
SELECT 3, TIMESTAMP '2021-03-12 01:00:00', 101, 80, 1 FROM DUAL;
with i as (
select
/*Get total per product, supplier at each timestamp
to calculate running sum on timestamps without need to resolve ties with over(... rows between) addition*/
inv_timestamp
, product_id
, supplier_id
, sum(in_quantity) as quan
, sum(sum(in_quantity)) over(
partition by product_id
order by
inv_timestamp asc
, supplier_id asc
) as rsum
from INVENTORY_IN
group by
product_id
, supplier_id
, inv_timestamp
)
, o as (
select /*The same for customer*/
inv_timestamp
, product_id
, customer_id
, sum(out_quantity) as quan
, sum(sum(out_quantity)) over(
partition by product_id
order by
inv_timestamp asc
, customer_id asc
) as rsum
/*Last consumption per product: when lead goes beyond the current window*/
, lead(0, 1, 1) over(
partition by product_id
order by
inv_timestamp asc
, customer_id asc
) as last_consumption
from INVENTORY_OUT
group by
product_id
, customer_id
, inv_timestamp
)
, distr as (
select
/*Distribute the quantity. This is the basic interval intersection:
new_value_to = least(t1.value_to, t2.value_to)
new_value_from = greatest(t1.value_from, t2.value_from)
So we need a capacity of the interval
*/
i.product_id
, least(i.rsum, nvl(o.rsum, i.rsum))
- greatest(i.rsum - i.quan, nvl(o.rsum - o.quan, i.rsum - i.quan)) as supplied_quan
/*At the last supply we can have something not used.
Calculate it to add later as not consumed
*/
, case
when last_consumption = 1
and i.rsum > nvl(o.rsum, i.rsum)
then i.rsum - o.rsum
end as rest_quan
, i.supplier_id
, o.customer_id
, i.inv_timestamp as i_ts
, o.inv_timestamp as o_ts
from i
left join o
on i.product_id = o.product_id
/*No equality here, because values are continuous:
>= will include the same value in two intervals if some of value_to of one table equals
another's table value_to (which is value_from for the next interval)*/
and i.rsum > o.rsum - o.quan
and o.rsum > i.rsum - i.quan
)
select
product_id
, supplier_id
, customer_id
, sum(quan) as quan
from (
select /*Get distributed quantities*/
product_id
, supplier_id
, customer_id
, supplied_quan as quan
, i_ts
, o_ts
from distr
union all
select /*Add not consumed part of last consumed supply*/
product_id
, supplier_id
, null
, rest_quan
, i_ts
, null /*No consumption*/
from distr
where rest_quan is not null
)
group by
product_id
, supplier_id
, customer_id
order by
min(i_ts) asc
/*To order not consumed last*/
, min(o_ts) asc nulls last
PRODUCT_ID | SUPPLIER_ID | CUSTOMER_ID | QUAN
---------: | ----------: | ----------: | ---:
101 | 0 | 1 | 20
101 | 4 | 1 | 60
101 | 4 | 2 | 40
101 | 3 | 1 | 30
101 | 3 | null | 20
101 | 2 | null | 10
db<>fiddle here
I am trying use a oracle pivot function to display the data in below format. I have tried to use examples I found stackoverflow, but I am unable to achieve what I am looking.
With t as
(
select 1335 as emp_id, 'ADD Insurance New' as suuid, sysdate- 10 as startdate, null as enddate from dual
union all
select 1335 as emp_id, 'HS' as suuid, sysdate- 30 as startdate, null as enddate from dual
union all
select 1335 as emp_id, 'ADD Ins' as suuid, sysdate- 30 as startdate, Sysdate - 10 as enddate from dual
)
select * from t
output:
+--------+-------------------+-------------------+---------+-------------------+
| EMP_ID | SUUID_1 | SUUID_1_STARTDATE | SUUID_2 | SUUID_2_STARTDATE |
+--------+-------------------+-------------------+---------+-------------------+
| 1335 | ADD Insurance New | 10/5/2020 15:52 | HS | 9/15/2020 15:52 |
+--------+-------------------+-------------------+---------+-------------------+
Can anyone suggest to how to use SQL Pivot to get this format?
You can use conditional aggregation. There is more than one way to understand your question, but one approach that would work for your sample data is:
select emp_id,
max(case when rn = 1 then suuid end) suuid_1,
max(case when rn = 1 then startdate end) suid_1_startdate,
max(case when rn = 2 then suuid end) suuid_2,
max(case when rn = 2 then startdate end) suid_2_startdate
from (
select t.*, row_number() over(partition by emp_id order by startdate desc) rn
from t
where enddate is null
) t
group by emp_id
Demo on DB Fiddle:
EMP_ID | SUUID_1 | SUID_1_STARTDATE | SUUID_2 | SUID_2_STARTDATE
-----: | :---------------- | :--------------- | :------ | :---------------
1335 | ADD Insurance New | 05-OCT-20 | HS | 15-SEP-20
You can do it with PIVOT:
With t ( emp_id, suuid, startdate, enddate ) as
(
select 1335, 'ADD Insurance New', sysdate- 10, null from dual union all
select 1335, 'HS', sysdate- 30, null from dual union all
select 1335, 'ADD Ins', sysdate- 30, Sysdate - 10 from dual
)
SELECT emp_id,
"1_SUUID" AS suuid1,
"1_STARTDATE" AS suuid_startdate1,
"2_SUUID" AS suuid2,
"2_STARTDATE" AS suuid_startdate2
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( ORDER BY startdate DESC, enddate DESC NULLS FIRST )
AS rn
FROM t
)
PIVOT (
MAX( suuid ) AS suuid,
MAX( startdate ) AS startdate,
MAX( enddate ) AS enddate
FOR rn IN ( 1, 2 )
)
Outputs:
EMP_ID | SUUID1 | SUUID_STARTDATE1 | SUUID2 | SUUID_STARTDATE2
-----: | :---------------- | :--------------- | :----- | :---------------
1335 | ADD Insurance New | 05-OCT-20 | HS | 15-SEP-20
db<>fiddle here
Working on problem for a company in Japan. The government has some rules such as... If you are on a work visa:
You cannot work for more than 3 years at a company without taking 30 days off
You cannot work for the same staffing company for more than 5 years without taking 6 months off
So we want to figure out if anyone will be violating either rule in the next 30/60/90 days.
Sample data (list of contracts):
if object_id('tempdb..#sampleDates') is not null drop table #sampleDates
create table #sampleDates (UserId int, CompanyID int, WorkPeriodStart datetime, WorkPeriodEnd datetime)
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27809, 972, '2019-10-10', '2020-10-10')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27853, 484, '2019-10-10', '2020-10-10')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27856, 172, '2019-10-10', '2020-10-10')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2015-01-01', '2015-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2016-01-01', '2017-02-28')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2017-01-01', '2017-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2018-01-01', '2018-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2019-01-01', '2020-01-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2020-01-01', '2020-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27897, 179, '2019-10-10', '2020-10-10')
My first issue is possibly overlapping dates. I am close to a solution on that already, but until I know how to solve the Working X years/ Y Days off issue, I'm not sure what the output of my cte or temp table should look like.
I don't expect anyone to do the work for me, but I want to find an article that can tell me:
How can I determine if someone has taken any breaks in the time period, and for how long (gaps between date ranges)?
How can I figure if they will have worked for 3/5 years without a 30/180 days break in the next 30/60/90 days?
This seemed so simple until I started coding the procedure.
Thanks for any help in advance.
EDIT:
For what it's worth, here's my second working attempt at eliminating overlapping dates (first version used a dense_rank approach and it worked until I screwed something up, went with something simple):
;with CJ as (
select UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd from #sampleDates c
)
select
c.CompanyID,
c.WorkPeriodStart,
min(t1.WorkPeriodEnd) as EndDate
from CJ c
inner join CJ t1 on c.WorkPeriodStart <= t1.WorkPeriodEnd and c.UserId = t1.UserId and c.CompanyID = t1.CompanyID
and not exists(select * from CJ t2 where t1.UserId = t2.UserId and t1.CompanyID = t2.CompanyID and t1.WorkPeriodEnd >= t2.WorkPeriodStart AND t1.WorkPeriodEnd < t2.WorkPeriodEnd)
where not exists(select * from CJ c2 where c.UserId = c2.UserId and c.CompanyID = c2.CompanyID and c.WorkPeriodStart > c2.WorkPeriodStart AND c.WorkPeriodStart <= c2.WorkPeriodEnd)
group by c.UserId, c.CompanyID, c.WorkPeriodStart
order by c.UserId, c.WorkPeriodStart
Disclaimer: This is an incomplete answer.
I can continue later, but this shows how to compute the islands. Then identifying the offender ones shouldn't be that complicated.
See augmented example. I added user 27897 that has three islands: 0, 1, and 2. See below:
create table t (UserId int, CompanyID int, WorkPeriodStart date, WorkPeriodEnd date);
insert t (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values
(27809, 972, '2019-10-10', '2020-10-10'),
(27853, 484, '2019-10-10', '2020-10-10'),
(27856, 172, '2019-10-10', '2020-10-10'),
(27857, 1234, '2015-01-01', '2015-12-31'),
(27857, 1234, '2016-01-01', '2017-02-28'),
(27857, 1234, '2017-01-01', '2017-12-31'),
(27857, 1234, '2018-01-01', '2018-12-31'),
(27857, 1234, '2019-01-01', '2020-01-31'),
(27857, 1234, '2020-01-01', '2020-12-31'),
(27897, 179, '2015-05-28', '2015-09-30'),
(27897, 179, '2017-03-11', '2017-04-30'),
(27897, 188, '2017-02-20', '2017-07-07'),
(27897, 179, '2019-10-10', '2020-10-10');
With this data, the query that computes the island for each row can look like:
select *,
sum(hop) over(partition by UserId order by WorkPeriodStart) as island
from (
select *,
case when WorkPeriodStart > dateadd(day, 1, max(WorkPeriodEnd)
over(partition by UserId
order by WorkPeriodStart
rows between unbounded preceding and 1 preceding))
then 1 else 0 end as hop
from t
) x
order by UserId, WorkPeriodStart
Result:
UserId CompanyID WorkPeriodStart WorkPeriodEnd hop island
------ --------- --------------- ------------- --- ------
27809 972 2019-10-10 2020-10-10 0 0
27853 484 2019-10-10 2020-10-10 0 0
27856 172 2019-10-10 2020-10-10 0 0
27857 1234 2015-01-01 2015-12-31 0 0
27857 1234 2016-01-01 2017-02-28 0 0
27857 1234 2017-01-01 2017-12-31 0 0
27857 1234 2018-01-01 2018-12-31 0 0
27857 1234 2019-01-01 2020-01-31 0 0
27857 1234 2020-01-01 2020-12-31 0 0
27897 179 2015-05-28 2015-09-30 0 0
27897 188 2017-02-20 2017-07-07 1 1
27897 179 2017-03-11 2017-04-30 0 1
27897 179 2019-10-10 2020-10-10 1 2
Now, we can augment this query to get the "worked days" for each island, and the "days off" before each island, by doing:
select *,
datediff(day, s, e) + 1 as worked,
datediff(day, lag(e) over(partition by UserId order by island), s) as prev_days_off
from (
select UserId, island, min(WorkPeriodStart) as s, max(WorkPeriodEnd) as e
from (
select *,
sum(hop) over(partition by UserId order by WorkPeriodStart) as island
from (
select *,
case when WorkPeriodStart > dateadd(day, 1, max(WorkPeriodEnd)
over(partition by UserId
order by WorkPeriodStart
rows between unbounded preceding and 1 preceding))
then 1 else 0 end as hop
from t
) x
) y
group by UserId, island
) x
order by UserId, island
Result:
UserId island s e worked prev_days_off
------ ------ ---------- ---------- ------ -------------
27809 0 2019-10-10 2020-10-10 367 <null>
27853 0 2019-10-10 2020-10-10 367 <null>
27856 0 2019-10-10 2020-10-10 367 <null>
27857 0 2015-01-01 2020-12-31 2192 <null>
27897 0 2015-05-28 2015-09-30 126 <null>
27897 1 2017-02-20 2017-07-07 138 509
27897 2 2019-10-10 2020-10-10 367 825
This result is much close to what you need. That data is actually useful to filter rows according to your criteria.
This script merges any overlapping work periods and then calculates the total days worked within the previous 3 and 5 year periods. Then takes this value and determines if this is more than the maximum working days allowed within that period by UserId and CompanyId for the 3 year limit, and just by UserId for the 5 year limit. (Is this a correct interpretation of the rules in your question?)
From this it then simply adds on 30, 60 and 90 days to that total, to see if that larger value would be over the respective limits. Given the different grouping rules, this would be cleaner as 2 queries (no duplication of UserId for 5 year rule) but the result is still a flag against any offending UserId.
In the example below you can see UserId = 27857 only violating the 5 year rule at present, but then also violating the 3 year rule should they stay on for another 60 days. In addition, UserId = 27858 is currently okay, but will violate the 5 year rule in 60 days.
I have made some assumptions about how you define a year and whether or not your WorkPeriodEnd values are inclusive or not, so do check that your required logic is properly applied.
Script
if object_id('tempdb..#sampleDates') is not null drop table #sampleDates
create table #sampleDates (UserId int, CompanyId int, WorkPeriodStart datetime, WorkPeriodEnd datetime)
insert #sampleDates values
(27809, 972, '2019-10-10', '2020-10-10')
,(27853, 484, '2019-10-10', '2020-10-10')
,(27856, 172, '2019-10-10', '2020-10-10')
,(27857, 1234, '2015-01-01', '2015-12-31')
,(27857, 1234, '2016-01-01', '2017-02-28')
,(27857, 1234, '2017-01-01', '2017-12-31')
,(27857, 1234, '2018-01-01', '2018-12-31')
,(27857, 1234, '2019-01-01', '2020-01-31')
,(27857, 1234, '2020-01-01', '2020-05-31')
,(27858, 1234, '2015-01-01', '2015-12-31')
,(27858, 1234, '2016-01-01', '2017-02-28')
,(27858, 1234, '2017-01-01', '2017-12-31')
,(27858, 1234, '2018-01-01', '2018-12-31')
,(27858, 1234, '2019-09-01', '2020-01-31')
,(27858, 1234, '2020-01-01', '2020-08-31')
,(27859, 12345, '2015-01-01', '2015-12-31')
,(27859, 12346, '2016-01-01', '2017-02-28')
,(27859, 12347, '2017-01-01', '2017-12-31')
,(27859, 12348, '2018-01-01', '2018-12-31')
,(27859, 12349, '2019-01-01', '2020-01-31')
,(27859, 12340, '2020-01-01', '2020-12-31')
,(27897, 179, '2019-10-10', '2020-10-10')
;
declare #3YearsAgo date = dateadd(year,-3,getdate());
declare #3YearWorkingDays int = (365*3)-30;
declare #5YearsAgo date = dateadd(year,-5,getdate());
declare #5YearWorkingDays int = (365*5)-(365/2);
with p as
(
select UserId
,CompanyId
,min(WorkPeriodStart) as WorkPeriodStart
,max(WorkPeriodEnd) as WorkPeriodEnd
from(select l.*,
sum(case when dateadd(day,1,l.PrevEnd) < l.WorkPeriodStart then 1 else 0 end) over (partition by l.UserId, l.CompanyId order by l.WorkPeriodStart rows unbounded preceding) as grp
from(select d.*,
lag(d.WorkPeriodEnd) over (partition by d.UserId, d.CompanyId order by d.WorkPeriodEnd) as PrevEnd
from #sampleDates as d
) as l
) as g
group by grp
,UserId
,CompanyId
)
,d as
(
select UserId
,CompanyId
,sum(case when #3YearsAgo < WorkPeriodEnd
then datediff(day
,case when #3YearsAgo between WorkPeriodStart and WorkPeriodEnd then #3YearsAgo else WorkPeriodStart end
,WorkPeriodEnd
)
else 0
end
) as WorkingDays3YearsToToday
,sum(case when #5YearsAgo < WorkPeriodEnd
then datediff(day
,case when #5YearsAgo between WorkPeriodStart and WorkPeriodEnd then #5YearsAgo else WorkPeriodStart end
,WorkPeriodEnd
)
else 0
end
) as WorkingDays5YearsToToday
from p
group by UserId
,CompanyId
)
select UserId
,CompanyId
,#3YearWorkingDays as Limit3Year
,#5YearWorkingDays as Limit5Year
,WorkingDays3YearsToToday
,WorkingDays5YearsToToday
,case when WorkingDays3YearsToToday > #3YearWorkingDays then 1 else 0 end as Violation3YearNow
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) > #5YearWorkingDays then 1 else 0 end as Violation5YearNow
,case when WorkingDays3YearsToToday + 30 > #3YearWorkingDays then 1 else 0 end as Violation3Year30Day
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) + 30 > #5YearWorkingDays then 1 else 0 end as Violation5Year30Day
,case when WorkingDays3YearsToToday + 60 > #3YearWorkingDays then 1 else 0 end as Violation3Year60Day
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) + 60 > #5YearWorkingDays then 1 else 0 end as Violation5Year60Day
,case when WorkingDays3YearsToToday + 90 > #3YearWorkingDays then 1 else 0 end as Violation3Year90Day
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) + 90 > #5YearWorkingDays then 1 else 0 end as Violation5Year90Day
from d
order by UserId
,CompanyId;
Output
+--------+-----------+------------+------------+--------------------------+--------------------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
| UserId | CompanyId | Limit3Year | Limit5Year | WorkingDays3YearsToToday | WorkingDays5YearsToToday | Violation3YearNow | Violation5YearNow | Violation3Year30Day | Violation5Year30Day | Violation3Year60Day | Violation5Year60Day | Violation3Year90Day | Violation5Year90Day |
+--------+-----------+------------+------------+--------------------------+--------------------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
| 27809 | 972 | 1065 | 1643 | 366 | 366 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 27853 | 484 | 1065 | 1643 | 366 | 366 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 27856 | 172 | 1065 | 1643 | 366 | 366 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 27857 | 1234 | 1065 | 1643 | 1029 | 1760 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| 27858 | 1234 | 1065 | 1643 | 877 | 1608 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 27859 | 12340 | 1065 | 1643 | 365 | 365 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12345 | 1065 | 1643 | 0 | 147 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12346 | 1065 | 1643 | 0 | 424 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12347 | 1065 | 1643 | 147 | 364 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12348 | 1065 | 1643 | 364 | 364 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12349 | 1065 | 1643 | 395 | 395 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27897 | 179 | 1065 | 1643 | 366 | 366 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+--------+-----------+------------+------------+--------------------------+--------------------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
Here is what I ended up with.
<UselessExplanation>
The issues I kept facing were:
How can I handle any and all date range overlaps and determine just the days within the contract date ranges.
The client is STILL using SQL 2008, so I need some old(er) school tsql.
Ensure that the break times (times between contracts) is accurately calculated.
So I chose to come up with my own solution which is probably dumb, being that it needs to generate a record in memory for every Workday/Candidate combination. I do not see the contracts table going beyond the 5-10k record range. Only reason I'm going this direction.
I created a calendar table with every date in it from 1/1/1980 - 12/31/2050
I then left joined the contract ranges against the calendar table by CandidateId. These will be the dates worked.
Any dates in the calendar table that do not match a date within a contract range is a Break Day.
</UselessExplanation>
Calendar table
if object_id('CalendarTable') is not null drop table CalendarTable
go
create table CalendarTable (pk int identity, CalendarDate date )
declare #StartDate date = cast('1980-01-01' as date)
declare #EndDate date = cast('2050-12-31' as date)
while #StartDate <= #EndDate
begin
insert into CalendarTable ( CalendarDate ) values ( #StartDate )
set #StartDate = dateadd(dd, 1, #StartDate)
end
go
Query for 5 year violations (working 5 years without a 6 month cool off period)
declare #enddate date = dateadd(dd, 30, getdate())
declare #beginDate date = dateadd(dd, -180, dateadd(year, -5, getdate()))
select poss.CandidateId,
min(work.CalendarDate) as FirstWorkDate,
count(work.CandidateId) as workedDays,
sum(case when work.CandidateId is null then 1 else 0 end) as breakDays,
case when count(work.CandidateId) > (365*5) and sum(case when work.CandidateId is null then 1 else 0 end) < (365/2) then 1 else 0 end as Year5Violation,
case when count(work.CandidateId) > (365*5) and sum(case when work.CandidateId is null then 1 else 0 end) < (365/2) then DATEADD(year, 5, min(work.CalendarDate)) else null end as ViolationDate
from
(
select cand.CandidateId, cal.CalendarDate
from CalendarTable cal
join (select distinct c.CandidateId from contracts c where c.WorkPeriodStart is not null and c.WorkPeriodEnd is not null and c.Deleted = 0) cand on 1 = 1
where cal.CalendarDate between #beginDate and #enddate
) as poss
left join
(
select distinct c.CandidateId, cal.CalendarDate
from contracts c
join CalendarTable cal on cal.CalendarDate between c.WorkPeriodStart and c.WorkPeriodEnd
where c.WorkPeriodStart is not null and c.WorkPeriodEnd is not null and c.Deleted = 0
) as work on work.CandidateId = poss.CandidateId and work.CalendarDate = poss.CalendarDate
group by poss.CandidateId
I have a list of account balances over time. The schema looks like this:
+-------------+---------+---------+----------------------+
| customer_id | city_id | value | timestamp |
+-------------+---------+---------+----------------------+
| 1 | 1 | -500 | 2019-02-12T00:00:00 |
| 2 | 1 | -200 | 2019-02-12T00:00:00 |
| 3 | 2 | 200 | 2019-02-10T00:00:00 |
| 4 | 1 | -10 | 2019-02-09T00:00:00 |
+-------------+ --------+---------+----------------------+
I want to aggregate this data, such that I get the daily total negative account balance partitioned by city and ordered by time:
+---------+---------+--------------+
| city_id | value | timestamp |
+---------+---------+--------------+
| 1 | -500 | 2019-02-12 |
| 1 | -200 | 2019-02-10 |
| 1 | -10 | 2019-02-09 |
+ --------+---------+--------------+
What I've tried:
SELECT city_id, FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp)) as date,
SUM(value) OVER (PARTITION BY city_id ORDER BY FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp))) negative_account_balance
FROM `account_balances`
WHERE value < 0
However this gives me strange account balance values like -5.985856421224E10. Any ideas why? Besides that the query generates entries for the same city and same day multiple times. I would expect it to return a the same city only once for the same day.
Below is for BigQuery Standard SQL
#standardSQL
SELECT city_id, account_balance, `date` FROM (
SELECT city_id, `date`,
SUM(value) OVER(PARTITION BY city_id ORDER BY `date`) account_balance
FROM (
SELECT city_id, DATE(TIMESTAMP(t.timestamp)) AS `date`, SUM(value) value
FROM `project.dataset.account_balances` t
GROUP BY city_id, `date` )
)
WHERE account_balance< 0
You can test, play with above using sample/dummy data as in below example
#standardSQL
WITH `project.dataset.account_balances` AS (
SELECT 1 customer_id, 1 city_id, -500 value, '2019-02-12T00:00:00' `timestamp` UNION ALL
SELECT 2, 1, -200, '2019-02-12T00:00:00' UNION ALL
SELECT 5, 1, 100, '2019-02-13T00:00:00' UNION ALL
SELECT 3, 2, 200, '2019-02-10T00:00:00' UNION ALL
SELECT 4, 1, -10, '2019-02-09T00:00:00'
)
SELECT city_id, account_balance, `date` FROM (
SELECT city_id, `date`,
SUM(value) OVER(PARTITION BY city_id ORDER BY `date`) account_balance
FROM (
SELECT city_id, DATE(TIMESTAMP(t.timestamp)) AS `date`, SUM(value) value
FROM `project.dataset.account_balances` t
GROUP BY city_id, `date` )
)
WHERE account_balance< 0
which produces below result
Row city_id account_balance date
1 1 -10 2019-02-09
2 1 -710 2019-02-12
3 1 -610 2019-02-13
I took a simpler approach and used this sql (BTW When I tried your original query I got a result which seems ok)
SELECT city_id, FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp)) as date,
SUM(value) as value
FROM `account_balances`
GROUP BY city_id, timestamp
HAVING value < 0
I used this data to check it out (Note: I changed the date format to match BigQuery format although the result is the same either way)
WITH account_balances as (
SELECT 1 AS customer_id, 1 as city_id, -500 as value, '2019-02-12 00:00:00' as timestamp UNION ALL
SELECT 2 AS customer_id, 1 as city_id, -200 as value, '2019-02-12 00:00:00' as timestamp UNION ALL
SELECT 3 AS customer_id, 2 as city_id, 200 as value, '2019-02-10 00:00:00' as timestamp UNION ALL
SELECT 4 AS customer_id, 1 as city_id, -10 as value, '2019-02-09 00:00:00' as timestamp
)
SELECT city_id, FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp)) as date,
SUM(value) as value
FROM `account_balances`
GROUP BY city_id, timestamp
HAVING value < 0
This is the result:
I would like to identify the returning customers from an Oracle(11g) table like this:
CustID | Date
-------|----------
XC321 | 2016-04-28
AV626 | 2016-05-18
DX970 | 2016-06-23
XC321 | 2016-05-28
XC321 | 2016-06-02
So I can see which customers returned within various windows, for example within 10, 20, 30, 40 or 50 days. For example:
CustID | 10_day | 20_day | 30_day | 40_day | 50_day
-------|--------|--------|--------|--------|--------
XC321 | | | 1 | |
XC321 | | | | 1 |
I would even accept a result like this:
CustID | Date | days_from_last_visit
-------|------------|---------------------
XC321 | 2016-05-28 | 30
XC321 | 2016-06-02 | 5
I guess it would use a partition by windowing clause with unbounded following and preceding clauses... but I cannot find any suitable examples.
Any ideas...?
Thanks
No need for window functions here, you can simply do it with conditional aggregation using CASE EXPRESSION :
SELECT t.custID,
COUNT(CASE WHEN (last_visit- t.date) <= 10 THEN 1 END) as 10_day,
COUNT(CASE WHEN (last_visit- t.date) between 11 and 20 THEN 1 END) as 20_day,
COUNT(CASE WHEN (last_visit- t.date) between 21 and 30 THEN 1 END) as 30_day,
.....
FROM (SELECT s.custID,
LEAD(s.date) OVER(PARTITION BY s.custID ORDER BY s.date DESC) as last_visit
FROM YourTable s) t
GROUP BY t.custID
Oracle Setup:
CREATE TABLE customers ( CustID, Activity_Date ) AS
SELECT 'XC321', DATE '2016-04-28' FROM DUAL UNION ALL
SELECT 'AV626', DATE '2016-05-18' FROM DUAL UNION ALL
SELECT 'DX970', DATE '2016-06-23' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-05-28' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-06-02' FROM DUAL;
Query:
SELECT *
FROM (
SELECT CustID,
Activity_Date AS First_Date,
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '10' DAY FOLLOWING )
- 1 AS "10_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '20' DAY FOLLOWING )
- 1 AS "20_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '30' DAY FOLLOWING )
- 1 AS "30_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '40' DAY FOLLOWING )
- 1 AS "40_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '50' DAY FOLLOWING )
- 1 AS "50_Day",
ROW_NUMBER() OVER ( PARTITION BY CustID ORDER BY Activity_Date ) AS rn
FROM Customers
)
WHERE rn = 1;
Output
USTID FIRST_DATE 10_Day 20_Day 30_Day 40_Day 50_Day RN
------ ------------------- ---------- ---------- ---------- ---------- ---------- ----------
AV626 2016-05-18 00:00:00 0 0 0 0 0 1
DX970 2016-06-23 00:00:00 0 0 0 0 0 1
XC321 2016-04-28 00:00:00 0 0 1 2 2 1
Here is an answer that works for me, I have based it on your answers above, thanks for contributions from MT0 and Sagi:
SELECT CustID,
visit_date,
Prev_Visit ,
COUNT( CASE WHEN (Days_between_visits) <=10 THEN 1 END) AS "0-10_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 11 AND 20 THEN 1 END) AS "11-20_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 21 AND 30 THEN 1 END) AS "21-30_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 31 AND 40 THEN 1 END) AS "31-40_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 41 AND 50 THEN 1 END) AS "41-50_day" ,
COUNT( CASE WHEN (Days_between_visits) >50 THEN 1 END) AS "51+_day"
FROM
(SELECT CustID,
visit_date,
Lead(T1.visit_date) over (partition BY T1.CustID order by T1.visit_date DESC) AS Prev_visit,
visit_date - Lead(T1.visit_date) over (
partition BY T1.CustID order by T1.visit_date DESC) AS Days_between_visits
FROM T1
) T2
WHERE Days_between_visits >0
GROUP BY T2.CustID ,
T2.visit_date ,
T2.Prev_visit ,
T2.Days_between_visits;
This returns:
CUSTID | VISIT_DATE | PREV_VISIT | DAYS_BETWEEN_VISIT | 0-10_DAY | 11-20_DAY | 21-30_DAY | 31-40_DAY | 41-50_DAY | 51+DAY
XC321 | 2016-05-28 | 2016-04-28 | 30 | | | 1 | | |
XC321 | 2016-06-02 | 2016-05-28 | 5 | 1 | | | | |