Recognize if not enough resources during timeframe - sql

i need an idea how to solve the following problem.
Lets say i have one group with given timeframe (8:00-12:00) and i can assign resources (people) to it. Each resource can have a custom timeframe (like 9-10, 9-12,8-12 etc.) and could be assigned multiple times.
Tables
Groups
ID,
TITLE,
START_TIME,
END_TIME,
REQUIRED_PEOPLE:INTEGER
PeopleAssignments
ID,
USER_ID,
GROUP_ID,
START_TIME,
END_TIME
Now i have the rule that at any given time during the group timeframe that there have to be like like 4 people assigned. Otherwise i want to get a warning.
I am working with ruby & sql (Postgres) here.
is there an elegant way without iterating through the whole timeframe and checking if count(assignments) > REQUIRED_PEOPLE

You can solve this with only SQL too (if you are interested in such answers).
Range types offers great functions and operators to calculate with.
These solutions will give you rows, when there are sub-ranges, where there is some missing people from a given group (and it will give you which sub-range it is exactly & how many people is missing from the required number).
The easy way:
You wanted to try something similar to this. You'll need to pick some interval in which the count() is based on (I picked 5 minutes):
select g.id group_id, i start_time, i + interval '5 minutes' end_time, g.required_people - count(a.id)
from groups g
cross join generate_series(g.start_time, g.end_time, interval '5 minutes') i
left join people_assignments a on a.group_id = g.id
where tsrange(a.start_time, a.end_time) && tsrange(i, i + interval '5 minutes')
group by g.id, i
having g.required_people - count(a.id) > 0
order by g.id, i
But note that you won't be able to detect missing sub-ranges, when they are less than 5 minutes. F.ex. user1 has assignment for 11:00-11:56 and user2 has one for 11:59-13:00, they will appear to be "in" the group for 11:00-13:00 (so the missing sub-range of 11:56-11:59 will go unnoticed).
Note: the more short the interval is (what you've picked) the more precise (and slow!) the results will be.
http://rextester.com/GRC64969
The hard way:
You can accumulate the result on-the-fly with custom aggregates or recursive CTEs
with recursive r as (
-- start with "required_people" as "missing_required_people" in the whole range
select 0 iteration,
id group_id,
array[]::int[] used_assignment_ids,
-- build a json map, where keys are the time ranges
-- and values are the number of missing people for that range
jsonb_build_object(tsrange(start_time, end_time), required_people) required_people_per_time_range
from groups
where required_people > 0
and id = 1 -- query parameter
union all
select r.iteration + 1,
r.group_id,
r.used_assignment_ids || a.assignment_id,
d.required_people_per_time_range
from r
-- join a single assignment to the previous iteration, where
-- the assigment's time range overlaps with (at least one) time range,
-- where there is still missing people. when there are no such time range is
-- found in assignments, the "recursion" (which is really just a loop) stops
cross join lateral (
select a.id assignment_id, tsrange(start_time, end_time) time_range
from people_assignments a
cross join (select key::tsrange time_range from jsonb_each(r.required_people_per_time_range)) j
where a.group_id = r.group_id
and a.id <> ALL (r.used_assignment_ids)
and tsrange(start_time, end_time) && j.time_range
limit 1
) a
-- "partition" && accumulate all remaining time ranges with
-- the one found in the previous step
cross join lateral (
-- accumulate "partition" results
select jsonb_object_agg(u.time_range, u.required_people) required_people_per_time_range
from (select key::tsrange time_range, value::int required_people
from jsonb_each_text(r.required_people_per_time_range)) j
cross join lateral (
select u time_range, j.required_people - case when u && a.time_range then 1 else 0 end required_people
-- "partition" the found time range with all existing ones, one-by-one
from unnest(case
when j.time_range #> a.time_range
then array[tsrange(lower(j.time_range), lower(a.time_range)), a.time_range, tsrange(upper(a.time_range), upper(j.time_range))]
when j.time_range && a.time_range
then array[j.time_range * a.time_range, j.time_range - a.time_range]
else array[j.time_range]
end) u
where not isempty(u)
) u
) d
),
-- select only the last iteration
l as (
select group_id, required_people_per_time_range
from r
order by iteration desc
limit 1
)
-- unwind the accumulated json map
select l.group_id, lower(time_range) start_time, upper(time_range) end_time, missing_required_people
from l
cross join lateral (
select key::tsrange time_range, value::int missing_required_people
from jsonb_each_text(l.required_people_per_time_range)
) j
-- select only where there is still some missing people
-- this is optional, if you omit it you'll also see row(s) for sub-ranges where
-- there is enough people in the group (these rows will have zero,
-- or negative amount of "missing_required_people")
where j.missing_required_people > 0
http://rextester.com/GHPD52861

In any case you need to query number of assignment in DB. There is no other way to find how many times a group assign to people.
There might be ways to find number of assignment but in the end you have to fire a query to DB.
#group = Group.find(id)
if #group.people_assignments.count >= REQUIRED_PEOPLE
pus 'warning'
end
You can add extra column in group that hold information how many times that group assign to people. In this way one query to server reduced.
#group = Group.find(id)
if #group.count_people_assigned >= REQUIRED_PEOPLE
puts 'warning'
end
In second case count_people_assigned is column so no extra query will execute while in first case people_assignments is association so one extra query will fire.
But in second case you have you update group each time you assign group to people. Ultimately extra query. Your choice where you want to reduce query.
My opinion is second case, It will happen rare than first.

Related

Sum over a given time period

The following codes gives the total duration that a light has been switched on.
CREATE TABLE switch_times (
id SERIAL PRIMARY KEY,
is1 BOOLEAN,
id_dec INTEGER,
label TEXT,
ts TIMESTAMP WITH TIME ZONE default current_timestamp
);
CREATE VIEW makecount AS
SELECT *, row_number() OVER (PARTITION BY id_dec ORDER BY id) AS count
FROM switch_times;
select c1.label, SUM(c2.ts-c1.ts) AS sum
from
(makecount AS c1
inner join
makecount AS c2 ON c2.count = c1.count + 1)
where c2.is1=FALSE AND c1.id_dec = c2.id_dec AND c2.is1 != c1.is1
GROUP BY c1.label;
Link to working demo https://dbfiddle.uk/ZR8pLEBk
Any suggestions on how to alter the code so that it would give the sum over a given specific time period, say the 25th, during which all three lights were switched on for 12 hours? Problem 1: current code gives total sum, as follows. Problem 2: all durations that have not ended are disregarded, because there is no switch off time.
label sum
0x29 MH3 1 day 03:00:00
0x2B MH1 1 day 01:00:00
0x2C MH2 1 day 02:00:00
The expected results is just over a a given date, i.e.
label sum
0x29 MH3 12:00:00
0x2B MH1 12:00:00
0x2C MH2 12:00:00
Assuming the following (which should be defined in the question):
Postgres 15.
The table is big, many rows per label, performance matters, we can add indexes.
All columns are actually NOT NULL, you just forgot to declare columns as such.
Evey "light" has a distinct id_dec and a distinct label. Having both in switch_times is redundant. (Normalization!)
A light is "switched on" if the most recent earlier entry has is1 IS TRUE. Else it's considered "off".
The order of rows is established by ts, not by id as used in your query (typically incorrect).
Consecutive entries do not have to change the state.
No duplicate entries for (id_dec, ts). (There is a unique index enforcing that.)
There is no minimum or maximum time interval between entries.
"The 25th" is supposed to mean tstzrange '[2022-11-25 0:0+02, 2022-11-26 0:0+02)' (Note the time zone offsets.)
You want results for all labels that were switched on at all during the given time interval.
There is a table "labels" with one distinct entry per relevant light. If you don't have one, create it.
Indexes
Have at least these indexes to make everything fast:
CREATE INDEX ON switch_times (id_dec, ts DESC);
CREATE INDEX ON switch_times (ts);
Optional step to create table labels
CREATE TABLE labels AS
WITH RECURSIVE cte AS (
(
SELECT id_dec, label
FROM switch_times
ORDER BY 1
LIMIT 1
)
UNION ALL
(
SELECT s.id_dec, s.label
FROM cte c
JOIN switch_times s ON s.id_dec > c.id_dec
ORDER BY 1
LIMIT 1
)
)
TABLE cte;
ALTER TABLE labels
ADD PRIMARY KEY (id_dec)
, ALTER COLUMN label SET NOT NULL
, ADD CONSTRAINT label_uni UNIQUE (label)
;
Why this way? See:
Optimize GROUP BY query to retrieve latest row per user
Main query
WITH bounds(lo, hi) AS (
SELECT timestamptz '2022-11-25 0:0+02' -- enter time interval here *once*
, timestamptz '2022-11-26 0:0+02'
)
, snapshot AS (
SELECT id_dec, label, is1, ts
FROM switch_times s, bounds b
WHERE s.ts >= b.lo
AND s.ts < b.hi
UNION ALL -- must be separate
SELECT s.*
FROM labels l
JOIN LATERAL ( -- latest earlier entry
SELECT s.id_dec, s.label, s.is1, b.lo AS ts -- cut off at lower bound
FROM switch_times s, bounds b
WHERE s.id_dec = l.id_dec
AND s.ts < b.lo
ORDER BY s.ts DESC
LIMIT 1
) s ON s.is1 -- ... if it's "on"
)
SELECT label, sum(z - a) AS duration
FROM (
SELECT label
, lag(is1, 1, false) OVER w AS last_is1
, lag(ts) OVER w AS a
, ts AS z
FROM snapshot
WINDOW w AS (PARTITION BY label ORDER BY ts ROWS UNBOUNDED PRECEDING)
) sub
WHERE last_is1
GROUP BY 1;
fiddle
CTE bounds is an optional convenience feature to enter lower and upper bound of your time interval once.
CTE snapshot collects all rows of interest, which consists of
all rows inside the time interval (1st leg of UNION ALL query)
the latest earlier row if it was "on" (2nd leg of UNION ALL query)
We need to gather 2. separately to cover corner cases where the light was switched on earlier and there is no entry for the given time interval! But we can replace the timestamp to the lower bound immediately.
The final query gets the previous (is1, ts) for every row in a subquery, defaulting to "off" if there was no previous row.
Finally sum up intervals in the outer SELECT. Only sum what's switched on at the begin (no matter the final state).
Related:
Jump SQL gap over specific condition & proper lead() usage
My assumption
actual on time is time difference between is1 is true to next is1 false order by ts
Below query will calculate total sum of on time between two dates
select
id_dec ,
label,
sum(to_timestamp(nexttime)-ts) as time_def
from
(
select
id_dec,
"label",
ts,
is1,
case
when is1 = true then lead(extract(epoch from ts))over(partition by id_dec
order by
id_dec ,
ts asc)
else 0
end nexttime
from
switch_times
where
ts between '2022-11-24' and '2022-11-28'
) as a
where
nexttime <> 0
group by
id_dec,
label

TSQL "where ... group by ..." issue that needs solution like "having ..."

I have 3 sub-tables of different formats joined together with unions if this affects anything into full-table. There I have columns "location", "amount" and "time". Then to keep generality for my later needs I union full-table with location-table that has all possible "location" values and other fields are null into master-table.
I query master-table,
select location, sum(amount)
from master-table
where (time...)
group by location
However some "location" values are dropped because sum(amount) is 0 for those "location"s but I really want to have full list of those "location"s for my further steps.
Alternative would be to use HAVING clause but from what I understand HAVING is impossible here because i filter on "time" while grouping on "location" and I would need to add "time" in grouping which destroys the purpose. Keep in mind that the goal here is to get sum(amount) in each "location"
select location, sum(amount)
from master-table
group by location, time
having (time...)
To view the output:
with the first code I get
loc1, 5
loc3, 10
loc6, 1
but I want to get
loc1, 5
loc2, 0
loc3, 10
loc4, 0
loc5, 0
loc6, 1
Any suggestions on what can be done with this structure of master-table? Alternative solution to which I have no idea how to code would be to add numbers from the first query result to location-table (as a query, not actual table) with the final result query that I've posted above.
What you want will require a complete list of locations, then a left-outer join using that table and your calculated values, and IsNull (for tsql) to ensure you see the 0s you expect. You can do this with some CTEs, which I find valuable for clarity during development, or you can work on "putting it all together" in a more traditional SELECT...FROM... statement. The CTE approach might look like this:
WITH loc AS (
SELECT DISTINCT LocationID
FROM location_table
), summary_data as (
SELECT LocationID, SUM(amount) AS location_sum
FROM master-table
GROUP BY LocationID
)
SELECT loc.LocationID, IsNull(location_sum,0) AS location_sum
FROM loc
LEFT OUTER JOIN summary_data ON loc.LocationID = summary_data.LocationID
See if that gets you a step or two closer to the results you're looking for.
I can think of 2 options:
You could move the WHERE to a CASE WHEN construction:
-- Option 1
select
location,
sum(CASE WHEN time <'16:00' THEN amount ELSE 0 END)
from master_table
group by location
Or you could JOIN with the possible values of location (which is my first ever RIGHT JOIN in a very long time 😉):
-- Option 2
select
x.location,
sum(CASE WHEN m.time <'16:00' THEN m.amount ELSE 0 END)
from master_table m
right join (select distinct location from master_table) x ON x.location = m.location
group by x.location
see: DBFIDDLE
The version using T-SQL without CTEs would be:
SELECT l.location ,
ISNULL(m.location_sum, 0) as location_sum
FROM master-table l
LEFT JOIN (
SELECT location,
SUM(amount) as location_sum
FROM master-table
WHERE (time ... )
GROUP BY location
) m ON l.location = m.location
This assumes that you still have your initial UNION in place that ensures that master-table has all possible locations included.
It is the where clause that excludes some locations. To ensure you retain every location you could introduce "conditional aggregation" instead of using the where clause: e.g.
select location, sum(case when (time...) then amount else 0 end) as location_sum
from master-table
group by location
i.e. instead of excluding some rows from the result, place the conditions inside the sum function that equate to the conditions you would have used in the where clause. If those conditions are true, then it will aggregate the amount, but if the conditions evaluate to false then 0 is summed, but the location is retained in the result.

PostgreSQL GROUP BY that includes zeros

I have a SQL query (postgresql) that looks something like this:
SELECT
my_timestamp::timestamp::date as the_date,
count(*) as count
FROM my_table
WHERE ...
GROUP BY the_date
ORDER BY the_date
The result is a table of YYYY-MM-DD, count pairs.
Now I've been asked to fill in the empty dates with zero. So if I was previously providing
2022-03-15 3
2022-03-17 1
I'd now want to return
2022-03-15 3
2022-03-16 0
2022-03-17 1
Now I can easily do this client-side (relative to the database) and let my program compute and return the zero-augmented list to its clients based on the original list from postgres. But perhaps it would better if I could just tell postgresql to include zeros.
I suspect this isn't easy at all, because postgres has no obvious way of knowing what I'm up to. But in the interests of learning more about postgres and SQL, I thought I'd have try. The try isn't too promising thus far...
Any pointers before I conclude that I was right to leave this to my (postgres client) program?
Update
This is an interesting case where my simplification of the problem led to a correct answer that didn't work for me. For those who come after, I thought it worth documenting what followed, because it take some fun twists through constructing SQL queries.
#a_horse_with_no_name responded with a query that I've verified works if I simplify my own query to match. Unfortunately, my query had some extra baggage that I didn't think pertinent, and so had trimmed out when posting the original question.
Here's my real (original) query, with all names preserved (if shortened):
-- current query
SELECT
LEAST(time1, time2, time3, time4)::timestamp::date as the_date,
count(*) as count
FROM reading_group_reader rgr
INNER JOIN ( SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
WHERE LEAST(time1, time2, time3, time4) > current_date - 30
GROUP BY the_date
ORDER BY the_date;
If I translate that directly into the proposed solution, however, the inner join between reading_group_reader and the temporary table TT causes the left join to become inner (I think) and the date sequence drops its zeros again. Fwiw, the table TT is a table because sometimes it actually is a subselect.
So I transformed my query into this:
SELECT
g.dt::date as the_date,
count(*) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY the_date;
but this outputs 1's instead of 0's at the places that should be 0.
The reason for that, however, is because I've now selected every date, so, of course, there's one of each. I need to include an additional field (which will be NULL) and count that.
So this query finally does what I want:
SELECT
g.dt::date as the_date,
count(rgrt.device_id) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date,
rgr.device_id
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)
) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt(the_date)
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY g.dt;
And, of course, on re-reading the accepted answer, I eventually saw that he did count an unrelated field, which I'd simply missed on my first several readings.
You will need to join to a list of dates. This can e.g. be done using generate_series()
SELECT g.dt::date as the_date,
count(t.my_timestamp) as count
FROM generate_series(date '2022-03-01',
date '2022-03-31',
interval '1 day') as g(dt)
LEFT JOIN my_table as t
ON t.my_timestamp::date = g.dt::date
AND ... -- the original WHERE clause goes here!
GROUP BY the_date
ORDER BY the_date;
Note that the original WHERE conditions need to go into the join condition of the LEFT JOIN. You can't put them into a WHERE clause because that would turn the outer join back into an inner join (which means the missing dates wouldn't be returned).

Get apps with the highest review count since a dynamic series of days

I have two tables, apps and reviews (simplified for the sake of discussion):
apps table
id int
reviews table
id int
review_date date
app_id int (foreign key that points to apps)
2 questions:
1. How can I write a query / function to answer the following question?:
Given a series of dates from the earliest reviews.review_date to the latest reviews.review_date (incrementing by a day), for each date, D, which apps had the most reviews if the app's earliest review was on or later than D?
I think I know how to write a query if given an explicit date:
SELECT
apps.id,
count(reviews.*)
FROM
reviews
INNER JOIN apps ON apps.id = reviews.app_id
group by
1
having
min(reviews.review_date) >= '2020-01-01'
order by 2 desc
limit 10;
But I don't know how to query this dynamically given the desired date series and compile all this information in a single view.
2. What's the best way to model this data?
It would be nice to have the # of reviews at the time for each date as well as the app_id. As of now I'm thinking something that might look like:
... 2020-01-01_app_id | 2020-01-01_review_count | 2020-01-02_app_id | 2020-01-02_review_count ...
But I'm wondering if there's a better way to do this. Stitching the data together also seems like a challenge.
I think this is what you are looking for:
Postgres 13 or newer
WITH cte AS ( -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT app_id, total_ct
FROM cte c
WHERE c.earliest_review >= d.review_window_start
ORDER BY total_ct DESC
FETCH FIRST 1 ROWS WITH TIES -- new & hot
) sub
GROUP BY 1
) a ON true;
WITH TIES makes it a bit cheaper. Added in Postgres 13 (currently beta). See:
Get top row(s) with highest value, with ties
Postgres 12 or older
WITH cte AS ( -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT total_ct, app_id
, rank() OVER (ORDER BY total_ct DESC) AS rnk
FROM cte c
WHERE c.earliest_review >= d.review_window_start
) sub
WHERE rnk = 1
GROUP BY 1
) a ON true;
db<>fiddle here
Same as above, but without WITH TIES.
We don't need to involve the table apps at all. The table reviews has all information we need.
The CTE cte computes earliest review & current total count per app. The CTE avoids repeated computation. Should help quite a bit.
It is always materialized before Postgres 12, and should be materialized automatically in Postgres 12 since it is used many times in the main query. Else you could add the keyword MATERIALIZED in Postgres 12 or later to force it. See:
How to force evaluation of subquery before joining / pushing down to foreign server
The optimized generate_series() call produces the series of days from earliest to latest review. See:
Generating time series between two dates in PostgreSQL
Join a count query on generate_series() and retrieve Null values as '0'
Finally, the LEFT JOIN LATERAL you already discovered. But since multiple apps can tie for the most reviews, retrieve all winners, which can be 0 - n apps. The query aggregates all daily winners into an array, so we get a single result row per review_window_start. Alternatively, define tiebreaker(s) to get at most one winner. See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
If you are looking for hints, then here are a few:
Are you aware of generate_series() and how to use it to compose a table of dates given a start and end date? If not, then there are plenty of examples on this site.
To answer this question for any given date, you need to have only two measures for each app, and only one of these is used to compare an app against other apps. Your query in part 1 shows that you know what these two measures are.
Hints 1 and 2 should be enough to get this done. The only thing I can add is for you not to worry about making the database do "too much work." That is what it is there to do. If it does not do it quickly enough, then you can think about optimizations, but before you get to that step, concentrate on getting the answer that you want.
Please comment if you need further clarification on this.
The missing piece for me was lateral join.
I can accomplish just about what I want using the following:
select
review_windows.review_window_start,
id,
review_total,
earliest_review
from
(
select
date_trunc('day', review_windows.review_windows) :: date as review_window_start
from
generate_series(
(
SELECT
min(reviews.review_date)
FROM
reviews
),
(
SELECT
max(reviews.review_date)
FROM
reviews
),
'1 year'
) review_windows
order by
1 desc
) review_windows
left join lateral (
SELECT
apps.id,
count(reviews.*) as review_total,
min(reviews.review_date) as earliest_review
FROM
reviews
INNER JOIN apps ON apps.id = reviews.app_id
where
reviews.review_date >= review_windows.review_window_start
group by
1
having
min(reviews.review_date) >= review_windows.review_window_start
order by
2 desc,
3 desc
limit
2
) apps_most_reviews on true;

Firebird - Calculate time difference between two rows

Overview: I have tables SHIFT_LOG, SHIFT_LOG_DET & SHIFT_LOG_ENTRY having Parent-Child-GrandChild relationships (one-to-many). So,
LOG table contains shift details.
LOG_DET contains operators in a particular shift &
LOG_ENTRY table logs different entry types and timestamp for a user in a shift like (ADDED, STARTED, ON-BREAK, JOINED, ENDED).
Problem: For a given shift I can get all operators, and their entries using below query. What I can't do is to find the duration an operator spent on a particular entry type. i.e difference between two rows ENTRY_TIME.
SELECT
ent.ID as ENT_ID,
det.ID as DET_ID,
usr.CODE as USR_ID,
ent.SHIFT_LOG_DET_ID,
ent.ENTRY_TYPE,
IIF(ent.ENTRY_TYPE = 0 , 'ADDED',
IIF(ent.ENTRY_TYPE = 1 , 'STARTED',
IIF(ent.ENTRY_TYPE = 2 , 'ON-BREAK',
IIF(ent.ENTRY_TYPE = 3 , 'JOINED',
IIF(ent.ENTRY_TYPE = 4 , 'ENDED', 'UNKNOWN ENTRY'))))) as ENTRY_TYPE_VALUE,
ent.ENTRY_TIME+cast('31.12.1899' as timestamp) as ENTRY_TIME
FROM SHIFT_LOG_ENTRY ent
LEFT JOIN SHIFT_LOG_DET det on det.ID = ent.SHIFT_LOG_DET_ID
LEFT JOIN SHIFT_LOG log on log.ID = det.SHIFT_LOG_ID
LEFT JOIN USERS usr on usr.USERID = det.OPERATOR_ID
WHERE log.ID = 1
GROUP BY
usr.CODE,
ent.SHIFT_LOG_DET_ID,
det.ID,
ent.ID,
ENTRY_TYPE_VALUE,
ent.ENTRY_TIME,
ent.ENTRY_TYPE
Result Set:
So Inteval is the time spent in secs on a perticular ENTRY_TYPE. i.e
ROW(1).Interval = ( Row(2).EntryTime - Row(1).EntryTime )
Entry type ENDED has no interval as there is no other entry for the user after the shift has ended.
Firebird version is 2.5.3
Here is a different, "pro-active" approach. Whether it can fit your workflow decide for yourself. It is based upon adding special extra column just to link adjacent rows together.
Since LOG_ENTRY is a log of events, and events from same source, and events rather long (15 seconds is a lot for computer), I would assume that
Data is only added to the table, it is very rarely or never is edited or deleted
Data is added in ordered manner, that is when any event is being inserted - it is the LAST event in the batch (in your case batch seems to mean: for the given operator and the given shift).
If those assumptions hold, I'd add one more (indexed!) column to the table: batch_internal_id. It will start as zero on your selected row #1, will be 1 on the next row, will be 2 on the row #3 and so forth. It will be reset back to zero when the batch changes (on row #8 in your screenshot).
After that the calculation of time elapsed would be a simple continuous self-join, which should usually be faster, than having many sub-selects, one per row.
Something like that:
SELECT
ent.ID as ENT_ID,
ent.SHIFT_LOG_DET_ID,
ent.ENTRY_TYPE,
DECODE(ent.ENTRY_TYPE, 0 , 'ADDED', 1 , 'STARTED', 2 , 'ON-BREAK',
3 , 'JOINED', 4 , 'ENDED', 'UNKNOWN ENTRY')
as ENTRY_TYPE_VALUE, -- better make it an extra table to join!
ent.ENTRY_TIME+cast('31.12.1899' as timestamp) as ENTRY_TIME,
ent_next.ENTRY_TIME - ent.ENTRY_TIME as time_elapsed
FROM SHIFT_LOG_ENTRY ent
LEFT JOIN SHIFT_LOG_ENTRY ent_next ON
(ent.SHIFT_LOG_DET_ID = ent_next.SHIFT_LOG_DET_ID) and
(ent.batch_internal_id + 1 = ent_next.batch_internal_id)
ORDER BY ent.SHIFT_LOG_DET_ID, ent.batch_internal_id
The trick then would be to ensure correct filling of batch_internal_id within every batch and at the same time isolated from other batches.
Here is where the assumptions above become important.
You can easily auto-fill the new internal (batch-relative) ID field from a SQL trigger, providing that you made the warranty, that the event being inserted is always last in the batch.
Something like this:
CREATE TRIGGER SHIFT_LOG_DET_LINK_EVENTS
BEFORE UPDATE OR INSERT
ON SHIFT_LOG_DET
AS
BEGIN
NEW.batch_internal_id = 0;
SELECT FIRST(1) -- we only need one last row per same batch
prev.batch_internal_id + 1 -- next value
FROM SHIFT_LOG_DET prev
WHERE prev.SHIFT_LOG_DET_ID = NEW.SHIFT_LOG_DET_ID -- batch definition
ORDER BY prev.ENTRY_TIME DESCENDING
INTO NEW.batch_internal_id;
END
Such a trigger would initialize the relative ID with zero when new batch is started and with incremented last ID if there already were other rows for the batch.
It however is critically dependent upon always be called in-order when all the same batch's previous rows were already inserted and none of next rows was inserted yet.
One can also write the command a bit more laconic but maybe harder to read.
.......
AS
BEGIN
NEW.batch_internal_id =
COALESCE( (
SELECT FIRST(1) -- we only need one last row per same batch
prev.batch_internal_id + 1 -- next value
FROM SHIFT_LOG_DET prev
WHERE prev.SHIFT_LOG_DET_ID = NEW.SHIFT_LOG_DET_ID -- batch definition
ORDER BY prev.ENTRY_TIME DESCENDING
) , 0);
END
You will need to select the next date from the relevant entries. You can do this using something like:
select
SHIFT_LOG_DET_ID,
ENTRY_TIME,
datediff(minute from ENTRY_TIME to NEXT_ENTRY_TIME) as DURATION
from (
select
a.SHIFT_LOG_DET_ID,
a.ENTRY_TIME,
(select min(ENTRY_TIME)
from SHIFT_LOG_ENTRY
where SHIFT_LOG_DET_ID = a.SHIFT_LOG_DET_ID
and ENTRY_TIME > a.ENTRY_TIME) as NEXT_ENTRY_TIME
from SHIFT_LOG_ENTRY a
) b
See also this fiddle.
In Firebird 3, you can use the window function LEAD to achieve this:
select
SHIFT_LOG_DET_ID,
ENTRY_TIME,
datediff(minute from ENTRY_TIME
to lead(ENTRY_TIME) over (partition by SHIFT_LOG_DET_ID order by ENTRY_TIME)) as DURATION
from SHIFT_LOG_ENTRY
Full solution
This solution was contributed by AlphaTry
select
ENT_ID,
DET_ID,
USR_CODE,
SHIFT_LOG_DET_ID,
ENTRY_TYPE,
ENTRY_TYPE_VALUE,
ENTRY_TIME,
datediff(second from ENTRY_TIME to NEXT_ENTRY_TIME) as DURATION
from (
SELECT
ent.ID as ENT_ID,
det.ID as DET_ID,
usr.CODE as USR_CODE,
ent.SHIFT_LOG_DET_ID,
ent.ENTRY_TYPE as ENTRY_TYPE,
case (ent.ENTRY_TYPE)
when '0' then 'ADDED'
when '1' then 'STARTED'
when '2' then 'ON-BREAK'
when '3' then 'JOINED'
when '4' then 'ENDED'
else 'UNKNOWN ENTRY'
end as ENTRY_TYPE_VALUE,
ent.ENTRY_TIME+cast('31.12.1899' as timestamp) as ENTRY_TIME,
(
select min(ENTRY_TIME)
from SHIFT_LOG_ENTRY
where SHIFT_LOG_DET_ID = ent.SHIFT_LOG_DET_ID
and ENTRY_TIME > ent.ENTRY_TIME
)+cast('31.12.1899' as timestamp) as NEXT_ENTRY_TIME
FROM SHIFT_LOG_ENTRY ent
LEFT JOIN SHIFT_LOG_DET det on det.ID = ent.SHIFT_LOG_DET_ID
LEFT JOIN SHIFT_LOG log on log.ID = det.SHIFT_LOG_ID
LEFT JOIN USERS usr on usr.USERID = det.OPERATOR_ID
WHERE log.ID = 1
GROUP BY
usr.CODE,
ent.SHIFT_LOG_DET_ID,
det.ID,
ent.ID,
ENTRY_TYPE_VALUE,
ent.ENTRY_TIME,
ent.ENTRY_TYPE
) b
Result