Calculating working minutes for Normal and Night Shift - sql

I am making a query to fetch the working minutes for employees. The problem I have is the Night Shift. I know that I need to subtract the "ShiftStartMinutesFromMidnight" but I can't find the right logic.
NOTE: I can't changing the database, I only can use the data from it.
Let's say I have these records.
+----+--------------------------+----------+
| ID | EventTime | ReaderNo |
-----+--------------------------+----------+
| 1 | 2019-12-04 11:28:46.000 | In |
| 1 | 2019-12-04 12:36:17.000 | Out |
| 1 | 2019-12-04 12:39:23.000 | In |
| 1 | 2019-12-04 12:51:21.000 | Out |
| 1 | 2019-12-05 07:37:49.000 | In |
| 1 | 2019-12-05 08:01:22.000 | Out |
| 2 | 2019-12-04 22:11:46.000 | In |
| 2 | 2019-12-04 23:06:17.000 | Out |
| 2 | 2019-12-04 23:34:23.000 | In |
| 2 | 2019-12-05 01:32:21.000 | Out |
| 2 | 2019-12-05 01:38:49.000 | In |
| 2 | 2019-12-05 06:32:22.000 | Out |
-----+--------------------------+----------+
WITH CT AS (SELECT
EIn.PSNID, EIn.PSNNAME
,CAST(DATEADD(minute, -0, EIn.EventTime) AS date) AS dt
,EIn.EventTime AS LogIn
,CA_Out.EventTime AS LogOut
,DATEDIFF(minute, EIn.EventTime, CA_Out.EventTime) AS WorkingMinutes
FROM
VIEW_EVENT_EMPLOYEE AS EIn
CROSS APPLY
(
SELECT TOP(1) EOut.EventTime
FROM VIEW_EVENT_EMPLOYEE AS EOut
WHERE
EOut.PSNID = EIn.PSNID
AND EOut.ReaderNo = 'Out'
AND EOut.EventTime >= EIn.EventTime
ORDER BY EOut.EventTime
) AS CA_Out
WHERE
EIn.ReaderNo = 'In'
)
SELECT
PSNID
,PSNNAME
,dt
,LogIn
,LogOut
,WorkingMinutes
FROM CT
WHERE dt BETWEEN '2019-11-29' AND '2019-12-05'
ORDER BY LogIn
;
OUTPUT FROM QUERY
+----+------------+-------------------------+-------------------------+----------------+
| ID | date | In | Out | WorkingMinutes |
-----+------------+-------------------------+-------------------------+----------------+
| 1 | 2019-12-04 | 2019-12-04 11:28:46.000 | 2019-12-04 12:36:17.000 | 68 |
| 1 | 2019-12-04 | 2019-12-04 12:39:23.000 | 2019-12-04 12:51:21.000 | 12 |
| 1 | 2019-12-05 | 2019-12-05 07:37:49.000 | 2019-12-05 08:01:22.000 | 24 |
-----+------------+-------------------------+-------------------------+----------------+
I was thinking something like this. When Out is between 06:25 - 6:40. But I also need to check If employee, previous day has In between 21:50 - 22:30. I need that second condition because some employee from first shift maybe can Out, for example at 6:30.
*(1310 is the ShiftStartMinutesFromMidnight
Line 3 of Query
CAST(DATEADD(minute, -0, EIn.EventTime) AS date) AS dt
Updating the Line 3 with this code.
CASE
WHEN CAST(CA_Out.LogDate AS time) BETWEEN '06:25:00' AND '06:40:00'
AND CAST(EIn.LogDate AS time) BETWEEN '21:50:00' AND '22:30:00' THEN CAST(DATEADD(minute, -1310, EIn.LogDate) AS date)
ELSE CAST(DATEADD(minute, -0, EIn.LogDate) AS date)
END as dt
Expected Output
+----+------------+-------------------------+-------------------------+----------------+
| ID | date | In | Out | WorkingMinutes |
-----+------------+-------------------------+-------------------------+----------------+
| 2 | 2019-12-04 | 2019-12-04 22:11:46.000 | 2019-12-04 23:06:17.000 | 55 |
| 2 | 2019-12-04 | 2019-12-04 23:34:23.000 | 2019-12-05 01:32:21.000 | 118 |
| 2 | 2019-12-04 | 2019-12-05 01:38:49.000 | 2019-12-05 06:32:22.000 | 294 |
-----+------------+-------------------------+-------------------------+----------------+

Assuming that total minutes per separate date is enough:
WITH
/* enumerate pairs */
cte1 AS ( SELECT *,
COUNT(CASE WHEN ReaderNo = 'In' THEN 1 END)
OVER (PARTITION BY ID
ORDER BY EventTime) pair
FROM test ),
/* divide by pairs */
cte2 AS ( SELECT ID, MIN(EventTime) starttime, MAX(EventTime) endtime
FROM cte1
GROUP BY ID, pair ),
/* get dates range */
cte3 AS ( SELECT CAST(MIN(EventTime) AS DATE) minDate,
CAST(MAX(EventTime) AS DATE) maxDate
FROM test),
/* generate dates list */
cte4 AS ( SELECT minDate theDate
FROM cte3
UNION ALL
SELECT DATEADD(dd, 1, theDate)
FROM cte3, cte4
WHERE theDate < maxDate ),
/* add overlapped dates to pairs */
cte5 AS ( SELECT ID, starttime, endtime, theDate
FROM cte2, cte4
WHERE theDate BETWEEN CAST(starttime AS DATE) AND CAST(endtime AS DATE) ),
/* adjust borders */
cte6 AS ( SELECT ID,
CASE WHEN starttime < theDate
THEN theDate
ELSE starttime
END starttime,
CASE WHEN CAST(endtime AS DATE) > theDate
THEN DATEADD(dd, 1, theDate)
ELSE endtime
END endtime,
theDate
FROM cte5 )
/* calculate total minutes per date */
SELECT ID,
theDate,
SUM(DATEDIFF(mi, starttime, endtime)) workingminutes
FROM cte6
GROUP BY ID,
theDate
ORDER BY 1,2
fiddle
The solution is specially made detailed, step by step, so that you can easily understand the logic.
You may freely combine some CTEs into one. You may also use pre-last cte5 combined with cte2 if you need the output strongly as shown.
The solution assumes that none records are lost in source data (each 'In' matches strongly one 'Out' and backward, and no adjacent or overlapped pairs).

Don't know where you stopped but here is how I do,
Night shift 20:00 - 05:00 so in one day 00:00 - 5:00; 22:00 - 24:00
day shift 5:00 - 22:00
To get easier overlapping checking you need to change all dates to unix timestamp. so you don't have to split time intervals like shown above
So generate map of each period work for fetch period date_from and date_till, make sure to add holiday and pre-holiday exceptions where periods are different
something like:
Unix values is only for understanding.
unix_from_tim, unix_till_tim, shift_type
1580680800, 1580680800, 1 => example 02-02-2020:22:00:00, 03-02-2020:05:00:00, 1
1580680800, 1580680800, 0 => example 03-02-2020:05:00:00, 03-02-2020:22:00:00, 0
1580680800, 1580680800, 1 => example 03-02-2020:22:00:00, 04-02-2020:05:00:00, 1
...
Make sure you don't calculate overlapping minutes on period start/end..
And there is worker one row
with unix_from_tim, unix_from_tim
1580680800, 1580680800=> something like 02-02-2020:16:30:00, 03-02-2020:07:10:00
When you check overlapping you can get ms like this:
MIN(work_period:till,worker_period:till) - MAX(work_period:from, worker_period:from);
example in simple numbers:
work_period 3 - 7
worker_period 5 - 12
MIN(7,12) - MAX(3,5) = 7 - 5 = 2 //overlap
work_period 3 - 7
worker_period 8 - 12
MIN(7,12) - MAX(3,8) = 7 - 8 = -1 //if negative not overlap!
work_period 3 - 13
worker_period 8 - 12
MIN(13,12) - MAX(3,8) = 13 - 8 = 5 //full overlap!
And you have to check each worker period on all overlaping time generated work intervals.
May be someone can make select where you don't have to generate work_shift overlapping but its not a easy task if you add more holidays, transferred days, reduced time days etc.
Hope it helps

Related

Generate private SEQUENCE for each PARTITION

I have a table of match ups in different games, and I would like to calculate how dense the matchup space in regards to each game is. Example table:
id | game | start_dt
---+-------+-----------------
1 | dota2 | 2020-01-01 15:00
---+-------+-----------------
2 | dota2 | 2020-01-01 15:05
---+-------+-----------------
3 | dota2 | 2020-01-01 18:00
---+-------+-----------------
4 | cs-go | 2020-01-01 13:05
---+-------+-----------------
5 | cs-go | 2020-01-01 13:15
---+-------+-----------------
6 | dota2 | 2020-01-01 12:00
---+-------+-----------------
7 | cs-go | 2020-01-01 14:45
Would ideally yield:
id | game | start_dt | time_group_id
---+-------+-----------------+---------------
6 | dota2 | 2020-01-01 12:00| 1
---+-------+-----------------+---------------
1 | dota2 | 2020-01-01 15:00| 2
---+-------+-----------------+---------------
2 | dota2 | 2020-01-01 15:05| 2
---+-------+-----------------+---------------
3 | dota2 | 2020-01-01 18:00| 3
---+-------+-----------------+---------------
4 | cs-go | 2020-01-01 13:05| 4
---+-------+-----------------+---------------
5 | cs-go | 2020-01-01 13:15| 4
---+-------+-----------------+---------------
7 | cs-go | 2020-01-01 14:45| 5
Which basically means, that if a gap between the next game and the previous one is less or equal to 10 minutes, they are considered in the same time group. Else they are different time groups and it proceeds.
Those time_group_ids are then used to map useful information about matches and their time frequency.
My code is below, and it serves the purpose ideally, however, it doesn't give evenly spaced ids, so I have to use a composite of game VARCHAR and group_id for the field to uniquely represent a group. Please, run it in the dbfiddle to understand, what I mean.
CREATE TABLE fight(
id BIGSERIAL PRIMARY KEY,
date TIMESTAMP NOT NULL,
game VARCHAR NOT NULL
);
INSERT INTO fight(date, game)
VALUES
('2020-01-01 15:00'::TIMESTAMP, 'dota2'),
('2020-01-01 15:05'::TIMESTAMP, 'dota2'),
('2020-01-01 18:00'::TIMESTAMP, 'dota2'),
('2020-01-01 13:05'::TIMESTAMP, 'cs-go'),
('2020-01-01 13:15'::TIMESTAMP, 'cs-go'),
('2020-01-01 12:00'::TIMESTAMP, 'dota2'),
('2020-01-01 14:45'::TIMESTAMP, 'cs-go');
SELECT * FROM fight;
CREATE SEQUENCE seq START 1 CACHE 1;
SELECT
a.id,
a.game,
a.start_dt,
(CASE WHEN (a.start_dt - INTERVAL '10 min' <= a.prev_start_dt) THEN currval('seq')
ELSE nextval('seq')
END)::VARCHAR || '|' || a.game AS time_group_id
FROM
(
SELECT
fight.id,
fight.game,
fight.date AS start_dt,
LAG (fight.date, 1, fight.date) OVER (PARTITION BY fight.game ORDER BY fight.date) AS prev_start_dt
FROM fight CROSS JOIN (SELECT setval('seq', 1)) s
) a
ORDER BY a.game, a.start_dt;
The question is: is there the ideal way to do this, or should I stick with what I got?
You don't need a sequence for this, just a cumulative sum:
SELECT f.*,
COUNT(*) FILTER (WHERE prev_date < date - interval '10 min') OVER (ORDER BY date) as time_group_id
FROM (SELECT f.*,
LAG(f.date) OVER (PARTITION BY f.game ORDER BY f.date) AS prev_date
FROM fight f
) f;
Notes: This might start at 0 rather than 1. If that makes a difference, use 1 +.
This produces a number, not a string. You can convert to a string (using ::text) if that is what you really need.
Here is a db<>fiddle
SELECT
b.id,
b.game,
b.start_dt,
sum(b.time_group_count) OVER (ORDER BY b.game, b.start_dt) as time_group_id
FROM
(SELECT
a.id,
a.game,
a.start_dt,
CASE WHEN a.prev_start_dt IS NULL THEN 1
WHEN (a.start_dt - INTERVAL '10 min' <= a.prev_start_dt) THEN 0
ELSE 1
END AS time_group_count
FROM
(
SELECT
fight.id,
fight.game,
fight.date AS start_dt,
LAG (fight.date, 1) OVER (PARTITION BY fight.game ORDER BY fight.date) AS prev_start_dt
FROM fight
) a
ORDER BY a.game, a.start_dt) b;
This query is what gave me the results I really wanted. Really grateful to the cumsum idea by #Gordon Linoff, thank you!

Is there way to add date difference values we get to the date automatically?

What I was trying to do is I have two dates and using DateDiff to get a difference between dates. For example, I Have planned Start Date and actual start Date and I got the difference between this date is 5, now I want to add this day to the Finish date.
If my Finish date is not what I assumed, but behind, then that difference we got I want to add and want to find next finish date because we are behind so next upcoming dates.
Sum (DATEDIFF(day, sa.PlannedStartDate, sa.ActualStartDate)) OVER
(Partition
By ts.Id)as TotalVariance,
Case when (Sum (DATEDIFF(day, sa.PlannedStartDate, sa.ActualStartDate))
OVER
(Partition By ts.Id) >30) then 'Positive' end as Violation,
DATEADD (day, DATEDIFF(day, sa.PlannedStartDate, sa.ActualStartDate))as
Summar violations,
If the activity 1 - planned Start date is 8/21/2019 but the actual start date is 9/21/2019, in this case we are behind 30 days.
Now the next activity will be delayed, so I want to add this difference to the next activity.
If the second activity planned Start date was 08/25/2019, but because of the delay of activity 1 the start date will change for second activity, in this case I want to find that new date.
Activity PlannedStartdate ActualStartDate Variance NewPlannedstartdate
Activity 1 8/21/2019 9/21/2019 30
Acivity 2 8/26/2019 null 9/26/2019
Here's an example you can run in SSMS:
-- CREATE ACTIVITY TABLE AND ADD SOME DATA --
DECLARE #Activity TABLE ( ActivityId INT, PlannedStart DATE, ActualStart DATE );
INSERT INTO #Activity (
ActivityId, PlannedStart, ActualStart
)
VALUES
( 1, '08/21/2019', '08/27/2019' ), ( 1, '08/26/2019', NULL ), ( 1, '09/14/2019', NULL );
Query #Activity to see what's in it:
SELECT * FROM #Activity ORDER BY ActivityId, PlannedStart;
#Activity content:
+------------+--------------+-------------+
| ActivityId | PlannedStart | ActualStart |
+------------+--------------+-------------+
| 1 | 2019-08-21 | 2019-08-27 |
| 1 | 2019-08-26 | NULL |
| 1 | 2019-09-14 | NULL |
+------------+--------------+-------------+
Query #Activity to factor the new starting dates:
;WITH Activity_CTE AS (
SELECT
ROW_NUMBER() OVER ( ORDER BY PlannedStart ) AS Id,
ActivityId, PlannedStart, ActualStart, DATEDIFF( dd, PlannedStart, ActualStart ) Delayed
FROM #Activity
WHERE
ActivityId = #ActivityId
)
SELECT
ActivityId,
PlannedStart,
ActualStart,
DATEADD( dd, Delays.DaysDelayed, PlannedStart ) AS NewStart
FROM Activity_CTE AS Activity
OUTER APPLY (
SELECT CASE
WHEN ( Delayed IS NOT NULL ) THEN Delayed
ELSE ISNULL( ( SELECT TOP 1 Delayed FROM Activity_CTE WHERE Id < Activity.Id AND Delayed IS NOT NULL ORDER BY Id DESC ), 0 )
END AS DaysDelayed
) AS Delays
ORDER BY
PlannedStart;
Returns
+------------+--------------+-------------+------------+
| ActivityId | PlannedStart | ActualStart | NewStart |
+------------+--------------+-------------+------------+
| 1 | 2019-08-21 | 2019-08-27 | 2019-08-27 |
| 1 | 2019-08-26 | NULL | 2019-09-01 |
| 1 | 2019-09-14 | NULL | 2019-09-20 |
+------------+--------------+-------------+------------+
The real "magic" here is this line:
ELSE ISNULL( ( SELECT TOP 1 Delayed FROM Activity_CTE WHERE Id < Activity.Id AND Delayed IS NOT NULL ORDER BY Id DESC ), 0 )
It's checking for any prior records to itself that has a delay. If none are found, it returns 0. This value is then used to add days to the PlannedStart date to determine the NewStart date. The ORDER BY is of particular note too. Sorting in a DESC order ensures we get the "closest" delay prior to the current row.
Using a CTE in this way also takes into account the idea that the delay may not happen on the very first record (e.g., say the 08/26 planned was delayed instead of 08/21). It conveniently gives us a subtable to query against in our OUTER APPLY.
This is what you would see if you included all columns on the CTE's SELECT:
+----+------------+--------------+-------------+---------+-------------+
| Id | ActivityId | PlannedStart | ActualStart | Delayed | DaysDelayed |
+----+------------+--------------+-------------+---------+-------------+
| 1 | 1 | 2019-08-21 | 2019-08-27 | 6 | 6 |
| 2 | 1 | 2019-08-26 | NULL | NULL | 6 |
| 3 | 1 | 2019-09-14 | NULL | NULL | 6 |
+----+------------+--------------+-------------+---------+-------------+
Because the very first record is the only record with a delay, its delay of 6 days persists through each of the following records.

How to create a table that loops over data in Postgres

I want to create a table that returns the top 10 aggregate cons_name over a given week, that repeats every day.
So for 5/29/2019 it will pull the top 10 cons_name by their sum dating back to 5/22/2019.
Then, for 5/28/2019, the top 10 cons_name by their sum back to 5/21/2019.
A table of top 10 dating back 7 days all the way to 2018-12-01.
I can make the simple code dating back 7 days but, I have tried Windows to no avail.
SELECT cons_name,
pricedate,
sum(shadow)
FROM spp.rtbinds
WHERE pricedate >= current_date - 7
GROUP BY cons_name, shadow, pricedate
ORDER BY shadow asc
LIMIT 10
This query generates the output below
cons_name pricedate sum
"TEMP17_24078" "2019-05-28 00:00:00" "-1473.29723333333"
"TEMP17_24078" "2019-05-28 00:00:00" "-1383.56638333333"
"TMP175_24736" "2019-05-23 00:00:00" "-1378.40504166667"
"TMP159_24149" "2019-05-23 00:00:00" "-1328.847675"
"TMP397_24836" "2019-05-23 00:00:00" "-1221.19560833333"
"TEMP17_24078" "2019-05-28 00:00:00" "-1214.9914"
"TMP175_24736" "2019-05-23 00:00:00" "-1123.83254166667"
"TEMP72_22893" "2019-05-29 00:00:00" "-1105.93840833333"
"TMP164_23704" "2019-05-24 00:00:00" "-1053.051375"
"TMP175_24736" "2019-05-27 00:00:00" "-1043.52104166667"
I would like a table and function that returns a table of each day's top 10 dating back a week.
Using window functions get's you on the right track but you should be reading further in the documentation about the possibilities.
We have multiple issues here that we need to solve:
gaps in the data (missing pricedate) not get us the correct number of rows (7) to calculate the overall sum
for the calculation itself we need all data rows so the WHERE clause cannot be used to limit only to the visible days
in order to select the top-10 for each day, we have to generate a row number per partition because the LIMIT clause cannot be applied per group
This is why I came up with the following CTE's:
CTE days: generate the gap-less date series and mark visible days
CTE daily: LEFT JOIN the data to the generated days and produce daily sums (and handle NULL entries)
CTE calc: produce the cumulative sums
CTE numbered: produce row numbers reset each day
select the actual visible rows and limit them to max. 10 per day
So for a specific week (2019-05-26 - 2019-06-01), the query will look like the following:
WITH
days (c_day, c_visible, c_lookback) as (
SELECT gen::date, (CASE WHEN gen::date < '2019-05-26' THEN false ELSE true END), gen::date - 6
FROM generate_series('2019-05-26'::date - 6, '2019-06-01'::date, '1 day'::interval) AS gen
),
daily (cons_name, pricedate, shadow_sum) AS (
SELECT
r.cons_name,
r.pricedate::date,
coalesce(sum(r.shadow), 0)
FROM days
LEFT JOIN spp.rtbinds AS r ON (r.pricedate::date = days.c_day)
GROUP BY 1, 2
),
calc (cons_name, pricedate, shadow_sum) AS (
SELECT
cons_name,
pricedate,
sum(shadow_sum) OVER (PARTITION BY cons_name ORDER BY pricedate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
FROM daily
),
numbered (cons_name, pricedate, shadow_sum, position) AS (
SELECT
calc.cons_name,
calc.pricedate,
calc.shadow_sum,
ROW_NUMBER() OVER (PARTITION BY calc.pricedate ORDER BY calc.shadow_sum DESC)
FROM calc
)
SELECT
days.c_lookback,
numbered.cons_name,
numbered.shadow_sum
FROM numbered
INNER JOIN days ON (days.c_day = numbered.pricedate AND days.c_visible)
WHERE numbered.position < 11
ORDER BY numbered.pricedate DESC, numbered.shadow_sum DESC;
Online example with generated test data: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=a83a52e33ffea3783207e6b403bc226a
Example output:
c_lookback | cons_name | shadow_sum
------------+--------------+------------------
2019-05-26 | TMP400_27000 | 4578.04474575352
2019-05-26 | TMP700_25000 | 4366.56857151864
2019-05-26 | TMP200_24000 | 3901.50325547671
2019-05-26 | TMP400_24000 | 3849.39595793188
2019-05-26 | TMP700_28000 | 3763.51693260809
2019-05-26 | TMP600_26000 | 3751.72016620729
2019-05-26 | TMP500_28000 | 3610.75970225036
2019-05-26 | TMP300_26000 | 3598.36888491176
2019-05-26 | TMP600_27000 | 3583.89777677553
2019-05-26 | TMP300_21000 | 3556.60386707587
2019-05-25 | TMP400_27000 | 4687.20302128047
2019-05-25 | TMP200_24000 | 4453.61603102228
2019-05-25 | TMP700_25000 | 4319.10566615313
2019-05-25 | TMP400_24000 | 4039.01832416654
2019-05-25 | TMP600_27000 | 3986.68667223025
2019-05-25 | TMP600_26000 | 3879.92447655788
2019-05-25 | TMP700_28000 | 3632.56970774056
2019-05-25 | TMP800_25000 | 3604.1630071504
2019-05-25 | TMP600_28000 | 3572.50801157858
2019-05-25 | TMP500_27000 | 3536.57885829499
2019-05-24 | TMP400_27000 | 5034.53660146287
2019-05-24 | TMP200_24000 | 4646.08844632655
2019-05-24 | TMP600_26000 | 4377.5741555281
2019-05-24 | TMP700_25000 | 4321.11906399066
2019-05-24 | TMP400_24000 | 4071.37184911687
2019-05-24 | TMP600_25000 | 3795.00857752701
2019-05-24 | TMP700_26000 | 3518.6449117614
2019-05-24 | TMP600_24000 | 3368.15348120732
2019-05-24 | TMP200_25000 | 3305.84444172308
2019-05-24 | TMP500_28000 | 3162.57388606668
2019-05-23 | TMP400_27000 | 4057.08620966971
2019-05-23 | TMP700_26000 | 4024.11812392669
...

Oracle SQL Join Data Sequentially

I am trying to track the usage of material with my SQL. There is no way in our database to link when a part is used to the order it originally came from. A part simply ends up in a bin after an order arrives, and then usage of parts basically just creates a record for the number of parts used at a time of transaction. I am attempting to, as best I can, link usage to an order number by summing over the data and sequentially assigning it to order numbers.
My sub queries have gotten me this far. Each order number is received on a date. I then join the usage table records based on the USEDATE needing to be equal to or greater than the RECEIVEDATE of the order. The data produced by this is as such:
| ORDERNUM | PARTNUM | RECEIVEDATE | ORDERQTY | USEQTY | USEDATE |
|----------|----------|-------------------------|-----------|---------|------------------------|
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 11/18/2016 1:40:55 PM |
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 3 | 12/26/2016 2:19:32 PM |
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 1 | 11/18/2016 1:40:55 PM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 3 | 12/26/2016 2:19:32 PM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 3 | 12/26/2016 2:19:32 PM |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 1 | 1/3/2017 8:31:21 AM |
| 7812 | E1125 | 12/27/2016 10:56:01 AM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 1191 | E1125 | 1/5/2017 1:12:01 PM | 2 | 0 | null |
The query for the above section looks as such:
SELECT
B.*,
NVL(B2.QTY, ‘0’) USEQTY
B2.USEDATE USEDATE
FROM <<Sub Query B>>
LEFT JOIN USETABLE B2 ON B.PARTNUM = B2.PARTNUM AND B2.USEDATE >= B.RECEIVEDATE
My ultimate goal here is to join USEQTY records sequentially until they have filled enough ORDERQTY’s. I also need to add an ORDERUSE column that represents what QTY from the USEQTY column was actually applied to that record. Not really sure how to word this any better so here is example of what I need to happen based on the table above:
| ORDERNUM | PARTNUM | RECEIVEDATE | ORDERQTY | USEQTY | USEDATE | ORDERUSE |
|----------|----------|-------------------------|-----------|---------|------------------------|-----------|
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 11/18/2016 1:40:55 PM | 1 |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 3 | 12/26/2016 2:19:32 PM | 1 |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 2 | 12/26/2016 2:19:32 PM | 2 |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 1 | 1/3/2017 8:31:21 AM | 1 |
| 7812 | E1125 | 12/27/2016 10:56:01 AM | 1 | 0 | null | 0 |
| 1191 | E1125 | 1/5/2017 1:12:01 PM | 2 | 0 | null | 0 |
If I can get the query to pull the information like above, I will then be able to group the records together and sum the ORDERUSE column which would get me the information I need to know what orders have been used and which have not been fully used. So in the example above, if I were to sum the ORDERUSE column for each of the ORDERNUMs, orders 4412, 4111, 0393 would all show full usage. Orders 7812, 1191 would show not being fully used.
If i am reading this correctly you want to determine how many parts have been used. In your example it looks like you have 5 usages and with 5 orders coming to a total of 8 parts with the following orders having been used.
4412 - one part - one used
4111 - one part - one used
7812 - one part - one used
0393 - three
parts - two used
After a bit of hacking away I came up with the following SQL. Not sure if this works outside of your sample data since thats the only thing I used to test and I am no expert.
WITH data
AS (SELECT *
FROM (SELECT *
FROM sub_b1
join (SELECT ROWNUM rn
FROM dual
CONNECT BY LEVEL < 15) a
ON a.rn <= sub_b1.orderqty
ORDER BY receivedate)
WHERE ROWNUM <= (SELECT SUM(useqty)
FROM sub_b2))
SELECT sub_b1.ordernum,
partnum,
receivedate,
orderqty,
usage
FROM sub_b1
join (SELECT ordernum,
Max(rn) AS usage
FROM data
GROUP BY ordernum) b
ON sub_b1.ordernum = b.ordernum
You are looking for "FIFO" inventory accounting.
The proper data model should have two tables, one for "received" parts and the other for "delivered" or "used". Each table should show an order number, a part number and quantity (received or used) for that order, and a timestamp or date-time. I model both in CTE's in my query below, but in your business they should be two separate table. Also, a trigger or similar should enforce the constraint that a part cannot be used until it is available in stock (that is: for each part id, the total quantity used since inception, at any point in time, should not exceed the total quantity received since inception, also at the same point in time). I assume that the two input tables do, in fact, satisfy this condition, and I don't check it in the solution.
The output shows a timeline of quantity used, by timestamp, matching "received" and "delivered" (used) quantities for each part_id. In the sample data I illustrate a single part_id, but the query will work with multiple part_id's, and orders (both for received and for delivered or used) that include multiple parts (part id's) with different quantities.
with
received ( order_id, part_id, ts, qty ) as (
select '0030', '11A4', timestamp '2015-03-18 15:00:33', 20 from dual union all
select '0032', '11A4', timestamp '2015-03-22 15:00:33', 13 from dual union all
select '0034', '11A4', timestamp '2015-03-24 10:00:33', 18 from dual union all
select '0036', '11A4', timestamp '2015-04-01 15:00:33', 25 from dual
),
delivered ( order_id, part_id, ts, qty ) as (
select '1200', '11A4', timestamp '2015-03-18 16:30:00', 14 from dual union all
select '1210', '11A4', timestamp '2015-03-23 10:30:00', 8 from dual union all
select '1220', '11A4', timestamp '2015-03-23 11:30:00', 7 from dual union all
select '1230', '11A4', timestamp '2015-03-23 11:30:00', 4 from dual union all
select '1240', '11A4', timestamp '2015-03-26 15:00:33', 1 from dual union all
select '1250', '11A4', timestamp '2015-03-26 16:45:11', 3 from dual union all
select '1260', '11A4', timestamp '2015-03-27 10:00:33', 2 from dual union all
select '1270', '11A4', timestamp '2015-04-03 15:00:33', 16 from dual
),
(end of test data; the SQL query begins below - just add the word WITH at the top)
-- with
combined ( part_id, rec_ord, rec_ts, rec_sum, del_ord, del_ts, del_sum) as (
select part_id, order_id, ts,
sum(qty) over (partition by part_id order by ts, order_id),
null, cast(null as date), cast(null as number)
from received
union all
select part_id, null, cast(null as date), cast(null as number),
order_id, ts,
sum(qty) over (partition by part_id order by ts, order_id)
from delivered
),
prep ( part_id, rec_ord, del_ord, del_ts, qty_sum ) as (
select part_id, rec_ord, del_ord, del_ts, coalesce(rec_sum, del_sum)
from combined
)
select part_id,
last_value(rec_ord ignore nulls) over (partition by part_id
order by qty_sum desc) as rec_ord,
last_value(del_ord ignore nulls) over (partition by part_id
order by qty_sum desc) as del_ord,
last_value(del_ts ignore nulls) over (partition by part_id
order by qty_sum desc) as used_date,
qty_sum - lag(qty_sum, 1, 0) over (partition by part_id
order by qty_sum, del_ts) as used_qty
from prep
order by qty_sum
;
Output:
PART_ID REC_ORD DEL_ORD USED_DATE USED_QTY
------- ------- ------- ----------------------------------- ----------
11A4 0030 1200 18-MAR-15 04.30.00.000000000 PM 14
11A4 0030 1210 23-MAR-15 10.30.00.000000000 AM 6
11A4 0032 1210 23-MAR-15 10.30.00.000000000 AM 2
11A4 0032 1220 23-MAR-15 11.30.00.000000000 AM 7
11A4 0032 1230 23-MAR-15 11.30.00.000000000 AM 4
11A4 0032 1230 23-MAR-15 11.30.00.000000000 AM 0
11A4 0034 1240 26-MAR-15 03.00.33.000000000 PM 1
11A4 0034 1250 26-MAR-15 04.45.11.000000000 PM 3
11A4 0034 1260 27-MAR-15 10.00.33.000000000 AM 2
11A4 0034 1270 03-APR-15 03.00.33.000000000 PM 12
11A4 0036 1270 03-APR-15 03.00.33.000000000 PM 4
11A4 0036 21
12 rows selected.
Notes: (1) One needs to be careful if at one moment the cumulative used quantity exactly matches cumulative received quantity. All rows must be include in all the intermediate results, otherwise there will be bad data in the output; but this may result (as you can see in the output above) in a few rows with a "used quantity" of 0. Depending on how this output is consumed (for further processing, for reporting, etc.) these rows may be left as they are, or they may be discarded in a further outer-query with the condition where used_qty > 0.
(2) The last row shows a quantity of 21 with no used_date and no del_ord. This is, in fact, the "current" quantity in stock for that part_id as of the last date in both tables - available for future use. Again, if this is not needed, it can be removed in an outer query. There may be one or more rows like this at the end of the table.

Correlate Sequences of Independent Events - Calculate Time Intersection

We are building a PowerBI reporting solution and I (well Stack) solved one problem and the business came up with a new reporting idea. Not sure of the best way to approach it as I know very little about PowerBI and the business seems to want quite complex reports.
We have two sequences of events from separate data sources. They both contain independent events occurring to vehicles. One describes what location a vehicle is within - the other describes incident events which have a reason code for the incident. The business wants to report on time spent in each location for each reason. Vehicles can change location totally independent of the incident events occurring - and events actually are datetime and occur at random points throughtout day. Each type of event has a startime/endtime and a vehicleID.
Vehicle Location Events
+------------------+-----------+------------+-----------------+----------------+
| LocationDetailID | VehicleID | LocationID | StartDateTime | EndDateTime |
+------------------+-----------+------------+-----------------+----------------+
| 1 | 1 | 1 | 2012-1-1 | 2016-1-1 |
| 2 | 1 | 2 | 2016-1-1 | 2016-4-1 |
| 3 | 1 | 1 | 2016-4-1 | 2016-11-1 |
| 4 | 2 | 1 | 2011-1-1 | 2016-11-1 |
+------------------+-----------+------------+-----------------+----------------+
Vehicle Status Events
+---------+---------------+-------------+-----------+--------------+
| EventID | StartDateTime | EndDateTime | VehicleID | ReasonCodeID |
+---------+---------------+-------------+-----------+--------------+
| 1 | 2012-1-1 | 2013-1-1 | 1 | 1 |
| 2 | 2013-1-1 | 2015-1-1 | 1 | 3 |
| 3 | 2015-1-1 | 2016-5-1 | 1 | 4 |
| 4 | 2016-5-1 | 2016-11-1 | 1 | 2 |
| 5 | 2015-9-1 | 2016-2-1 | 2 | 1 |
+---------+---------------+-------------+-----------+--------------+
Is there anyway I can correlate the two streams together and calculate total time per Vehicle per ReasonCode per location? This would seem to require me to be able to relate the two events - so a change of location may occur part way through a given ReasonCode.
Calculation Example ReasonCodeID 4
VehicleID 1 is in location ID 1 from 2012-1-1 to 2016-1-1 and
2016-4-1 to 2016-11-1
VehicleID 1 is in location ID 2 from 2016-1-1
to 2016-4-1
VehcileID 1 has ReasonCodeID 4 from 2015-1-1 to
2016-5-1
Therefore first Period in location 1 intersects with 365 days of ReasonCodeID 4 (2015-1-1 to 2016-1-1). 2nd period in location 1 intersects with 30 days (2016-4-1 to 2016-5-1).
In location 2 intersects with 91 days of ReasonCodeID 4(2016-1-1 to 2016-4-1
Desired output would be the below.
+-----------+--------------+------------+------------+
| VehicleID | ReasonCodeID | LocationID | Total Days |
+-----------+--------------+------------+------------+
| 1 | 1 | 1 | 366 |
| 1 | 3 | 1 | 730 |
| 1 | 4 | 1 | 395 |
| 1 | 4 | 2 | 91 |
| 1 | 2 | 1 | 184 |
| 2 | 1 | 1 | 154 |
+-----------+--------------+------------+------------+
I have created a SQL fiddle that shows the structure here
Vehicles have related tables and I'm sure the business will want them grouped by vehicle class etc but if I can understand how to calculate the intersection points in this case that would give me the basis for rest of reporting.
I think this solution requires a CROSS JOIN implementation. The relationship between both tables is Many to Many which implies the creation of a third table that bridges LocationEvents and VehicleStatusEvents tables so I think specifying the relationship in the expression could be easier.
I use a CROSS JOIN between both tables, then filter the results only to get those rows which VehicleID columns are the same in both tables. I am also filtering the rows that VehicleStatusEvents range dates intersects LocationEvents range dates.
Once the filtering is done I am adding a column to calculate the count of days between each intersection. Finally, the measure sums up the days for each VehicleID, ReasonCodeID and LocationID.
In order to implement the CROSS JOIN you will have to rename the VehicleID, StartDateTime and EndDateTime on any of both tables. It is necessary for avoiding ambigous column names errors.
I rename the columns as follows:
VehicleID : LocationVehicleID and StatusVehicleID
StartDateTime : LocationStartDateTime and StatusStartDateTime
EndDateTime : LocationEndDateTime and StatusEndDateTime
After this you can use CROSSJOIN in the Total Days measure:
Total Days =
SUMX (
FILTER (
ADDCOLUMNS (
FILTER (
CROSSJOIN ( LocationEvents, VehicleStatusEvents ),
LocationEvents[LocationVehicleID] = VehicleStatusEvents[StatusVehicleID]
&& LocationEvents[LocationStartDateTime] <= VehicleStatusEvents[StatusEndDateTime]
&& LocationEvents[LocationEndDateTime] >= VehicleStatusEvents[StatusStartDateTime]
),
"CountOfDays", IF (
[LocationStartDateTime] <= [StatusStartDateTime]
&& [LocationEndDateTime] >= [StatusEndDateTime],
DATEDIFF ( [StatusStartDateTime], [StatusEndDateTime], DAY ),
IF (
[LocationStartDateTime] > [StatusStartDateTime]
&& [LocationEndDateTime] >= [StatusEndDateTime],
DATEDIFF ( [LocationStartDateTime], [StatusEndDateTime], DAY ),
IF (
[LocationStartDateTime] <= [StatusStartDateTime]
&& [LocationEndDateTime] <= [StatusEndDateTime],
DATEDIFF ( [StatusStartDateTime], [LocationEndDateTime], DAY ),
IF (
[LocationStartDateTime] >= [StatusStartDateTime]
&& [LocationEndDateTime] <= [StatusEndDateTime],
DATEDIFF ( [LocationStartDateTime], [LocationEndDateTime], DAY ),
BLANK ()
)
)
)
)
),
LocationEvents[LocationID] = [LocationID]
&& VehicleStatusEvents[ReasonCodeID] = [ReasonCodeID]
),
[CountOfDays]
)
Then in Power BI you can build a matrix (or any other visualization) using this measure:
If you don't understand completely the measure expression, here is the T-SQL translation:
SELECT
dt.VehicleID,
dt.ReasonCodeID,
dt.LocationID,
SUM(dt.Diff) [Total Days]
FROM
(
SELECT
CASE
WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime >= b.EndDateTime -- Inside range
THEN DATEDIFF(DAY, b.StartDateTime, b.EndDateTime)
WHEN a.StartDateTime > b.StartDateTime AND a.EndDateTime >= b.EndDateTime -- |-----|*****|....|
THEN DATEDIFF(DAY, a.StartDateTime, b.EndDateTime)
WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime <= b.EndDateTime -- |...|****|-----|
THEN DATEDIFF(DAY, b.StartDateTime, a.EndDateTime)
WHEN a.StartDateTime >= b.StartDateTime AND a.EndDateTime <= b.EndDateTime -- |---|****|-----
THEN DATEDIFF(DAY, a.StartDateTime, a.EndDateTime)
END Diff,
a.VehicleID,
b.ReasonCodeID,
a.LocationID --a.StartDateTime, a.EndDateTime, b.StartDateTime, b.EndDateTime
FROM LocationEvents a
CROSS JOIN VehicleStatusEvents b
WHERE a.VehicleID = b.VehicleID
AND
(
(a.StartDateTime <= b.EndDateTime)
AND (a.EndDateTime >= b.StartDateTime)
)
) dt
GROUP BY dt.VehicleID,
dt.ReasonCodeID,
dt.LocationID
Note in T-SQL you could use an INNER JOIN operator too.
Let me know if this helps.
select coalesce(l.VehicleID,s.VehicleID) as VehicleID
,s.ReasonCodeID
,l.LocationID
,sum
(
datediff
(
day
,case when s.StartDateTime > l.StartDateTime then s.StartDateTime else l.StartDateTime end
,case when s.EndDateTime < l.EndDateTime then s.EndDateTime else l.EndDateTime end
)
) as TotalDays
from VehicleLocationEvents as l
full join VehicleStatusEvents as s
on s.VehicleID =
l.VehicleID
and case when s.StartDateTime > l.StartDateTime then s.StartDateTime else l.StartDateTime end <=
case when s.EndDateTime < l.EndDateTime then s.EndDateTime else l.EndDateTime end
group by coalesce(l.VehicleID,s.VehicleID)
,s.ReasonCodeID
,l.LocationID
or
select VehicleID
,ReasonCodeID
,LocationID
,sum (datediff (day,max_StartDateTime,min_EndDateTime)) as TotalDays
from (select coalesce(l.VehicleID,s.VehicleID) as VehicleID
,s.ReasonCodeID
,l.LocationID
,case when s.StartDateTime > l.StartDateTime then s.StartDateTime else l.StartDateTime end as max_StartDateTime
,case when s.EndDateTime < l.EndDateTime then s.EndDateTime else l.EndDateTime end as min_EndDateTime
from VehicleLocationEvents as l
full join VehicleStatusEvents as s
on s.VehicleID =
l.VehicleID
) ls
where max_StartDateTime <= min_EndDateTime
group by VehicleID
,ReasonCodeID
,LocationID