I want to calculate the total number of employees logged in the system at a particular time and also the total work hour of each.
We have the login system which stores the data in the following data structure:
1.EmpId
2.Status
3.Created
The above data is stored in the following table:
EmpId Status Created
1 In 2019-10-23 12:00:00
1 Out 2019-10-23 12:45:45
2 In 2019-10-23 14:25:40
1 In 2019-10-23 18:45:45
2 Out 2019-10-23 20:50:40
2 In 2019-10-24 1:27:24
3 In 2019-10-24 2:45:45
The In is followed by Out and vice versa. And the employees work duration is spread across days, I mean the in and out could be across the days.
I need to implement the following:
How to calculate the number of employees logged in at a particular time say, "2019-10-23 14:12:45".
How to calculate the total work hours of all the employees since start?
You can use this query to get the employees registered on a specific time:
SELECT *
FROM (
SELECT EmpID, Created AS [In], LEAD(Created) OVER (PARTITION BY EmpID ORDER BY Created ASC) AS [Out]
FROM table_name
WHERE Status = 'In'
) t WHERE '2019-10-23 14:12:45' BETWEEN [In] AND [Out]
OR '2019-10-23 14:12:45' >= [In] AND [Out] IS NULL;
... and the following query to get the total work hours of each employee as TIME value:
SELECT EmpID, CONVERT(TIME(0), DATEADD(S, ISNULL(SUM(DATEDIFF(S, [In], [Out])), 0), 0), 108) AS work_hours
FROM (
SELECT EmpID, Created AS [In], LEAD(Created) OVER (PARTITION BY EmpID ORDER BY Created ASC) AS [Out]
FROM table_name
WHERE Status = 'In'
) t GROUP BY EmpID
demo on dbfiddle.uk
You can also define and use a CTE (common table expression) on the queries instead of the sub-select to get a flat table with login time and logout time as columns:
WITH Employees(EmpID, [In], [Out]) AS
(
SELECT EmpID, Created AS [In], LEAD(Created) OVER (PARTITION BY EmpID ORDER BY Created ASC) AS [Out]
FROM table_name
WHERE Status = 'In'
)
Assuming that ins and outs are interleaved, then you can use conditional aggregation and filtering:
select sum(case when status = 'in' then 1
when status = 'out' then -1
end) as employees_at_time
from t
where create <= '2019-10-23 14:12:45';
Related
I have a dataset [Table_1] that records all events on a new row, meaning there are multiple entries for each customer_id. The structure is this;
customer_id
recorded_at
event_type
value
123-456-789
2022-05-28
status
open
123-456-789
2022-06-01
attribute
order_placed
123-456-789
2022-06-02
attribute
order_fulfilled
123-456-789
2022-06-04
status
closed
123-456-789
2022-06-05
attribute
order_placed
123-456-789
2022-06-07
attribute
order_fulfilled
123-456-789
2022-06-10
status
open
123-456-789
2022-06-11
attribute
order_placed
123-456-789
2022-06-12
attribute
order_fulfilled
123-456-789
2022-06-15
attribute
order_placed
123-456-789
2022-06-17
attribute
order_fulfilled
987-654-321
2022-06-12
status
open
987-654-321
2022-06-15
attribute
order_placed
987-654-321
2022-06-17
attribute
order_fulfilled
987-654-321
2022-06-17
status
closed
What I'm trying to do is write a query that returns the dates of the two attributes, order_placed and order_fulfilled after the last time the status went open. My approach is to query the dataset three times, first for all customers who went open, then returning the dates when the attributes are order_placed and order_cancelled, however I'm running into issues in returning all instances where the attributes are order_placed and order_fulfilled, not just the most recent one.
With d1 as (Select customer_id,recorded_at as open_time from Table_1 where event_type = 'status' and value = 'open')
Select d1.customer_id,
d1.open_time,
order_placed.order_placed_time,
order_fulfilled.order_filled_time
from d1
left join (Select customer_id,max(recorded_at) as order_placed_time from Table_1 where event_type = 'attribute' and value = 'order_placed') order_placed
on d1.customer_id = order_placed.customer_id and order_placed.order_placed_time > d1.open_time
left join (Select customer_id,max(recorded_at) as order_fulfilled_time from Table_1 where event_type = 'attribute' and value = 'order_fulfilled') order_filled
on d1.customer_id = order_filled.customer_id and order_filled.order_fulfilled_time > d1.open_time
where order_filled.order_fulfilled_time > order_placed.order_placed_time
However, this only returns the last time an order was placed and fulfilled after the status = open, not every instance where that happened. The output I am going for would look like:
customer_id
open_time
order_placed_time
order_filled_time
123-456-789
2022-05-28
2022-06-01
2022-06-01
123-456-789
2022-06-10
2022-06-11
2022-06-12
123-456-789
2022-06-10
2022-06-15
2022-06-17
987-654-321
2022-06-12
2022-06-15
2022-06-17
What I'm trying to do is write a query that returns the dates of the two attributes, order_placed and order_fulfilled after the last time the status went open.
Consider below query:
WITH orders AS (
SELECT *, SUM(IF(value IN ('open', 'closed'), 1, 0)) OVER w AS order_group
FROM sample
WINDOW w AS (PARTITION BY customer_id ORDER BY recorded_at, event_type)
)
SELECT customer_id, open_time, pre_recorded_at AS order_placed_time, recorded_at AS order_filled_time
FROM (
SELECT *, FIRST_VALUE(IF(value = 'open', recorded_at, NULL)) OVER w AS open_time,
LAG(recorded_at) OVER w AS pre_recorded_at,
FROM orders
WINDOW w AS (PARTITION BY customer_id, order_group ORDER BY recorded_at)
)
WHERE open_time IS NOT NULL AND value = 'order_fulfilled'
;
output will be:
Note: Due to transactions below in your dataset, orders CTE has a weired event_type column in ORDER BY clause. If you have more accurate timestamp recorded_at, it can be removed. I'll leave it to you.
WINDOW w AS (PARTITION BY customer_id ORDER BY recorded_at, event_type)
987-654-321 2022-06-17 attribute order_fulfilled
987-654-321 2022-06-17 status closed
One option to solve this problem is following these steps:
keep all rows found between an open and an end, hence remove the end and the others
assign a unique id to different couples of ("order_placed","order_fulfilled")
extract the values relative to "open_time", "order_placed_time" and "order_fulfilled_time" with a CASE statement in three separate fields
apply different aggregations over "open_time" and "order_placed/fulfilled_time" separately, as long as each "open_time" can have multiple couples of orders.
These four steps are implemented in two ctes.
The first cte includes:
the first COUNT, that allows to extract even values for the open/order_placed/order_fulfilled (orders following open) values and odd values for the closed/order_placed/order_fulfilled values (orders following closed):
the second COUNT, that allows to extract different values for each couple made of ("order_placed", "order_fulfilled")
SELECT *,
COUNT(CASE WHEN value = 'open' THEN 1
WHEN value = 'closed' THEN 0 END) OVER (
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS status_value,
COUNT(CASE WHEN value <> 'order_fulfilled' THEN 1 END) OVER(
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS order_value
FROM tab
The second cte includes:
a WHERE clause that filters out all rows that are found between a "closed" and an "open" value, first included, last excluded
the first MAX window function, that partitions on the customer and on the previous first COUNT function, to extract the "open_time" value
the second MAX window function, that partitions on the customer and on the previous second COUNT function, to extract the "order_placed_time" value
the third MAX window function, that partitions on the customer and on the previous second COUNT function, to extract the "order_fulfilled_time" value
SELECT customer_id,
MAX(CASE WHEN value = 'open' THEN recorded_at END) OVER(
PARTITION BY customer_id, status_value
) AS open_time,
MAX(CASE WHEN value = 'order_placed' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_placed_time,
MAX(CASE WHEN value = 'order_fulfilled' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_fulfilled_time
FROM cte
WHERE MOD(status_value, 2) = 1
Note that it is not possible to use the MAX aggregation functions with a unique GROUP BY clause because the first MAX and the other two MAX aggregate on different columns respectively.
The final query uses the ctes and adds up:
a selection of DISTINCT rows (we're aggregating the output of the window functions)
a filtering operation on rows with NULL values in either the "order_placed_time" or "order_fulfilled_time" (correspond to the old "open" rows).
WITH cte AS (
SELECT *,
COUNT(CASE WHEN value = 'open' THEN 1
WHEN value = 'closed' THEN 0 END) OVER (
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS status_value,
COUNT(CASE WHEN value <> 'order_fulfilled' THEN 1 END) OVER(
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS order_value
FROM tab
), cte2 AS(
SELECT customer_id,
MAX(CASE WHEN value = 'open' THEN recorded_at END) OVER(
PARTITION BY customer_id, status_value
) AS open_time,
MAX(CASE WHEN value = 'order_placed' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_placed_time,
MAX(CASE WHEN value = 'order_fulfilled' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_fulfilled_time
FROM cte
WHERE MOD(status_value, 2) = 1
)
SELECT DISTINCT *
FROM cte2
WHERE order_fulfilled_time IS NOT NULL
I'd recommend to check intermediate output steps for a deep understanding of this specific solution.
Consider yet another option
with order_groups as (
select *,
countif(value in ('open', 'closed')) over order_group_sorted as group_num,
countif(value = 'order_placed') over order_group_sorted as subgroup_num,
from your_table
window order_group_sorted as (partition by customer_id order by recorded_at, event_type)
)
select * except(subgroup_num) from (
select customer_id, recorded_at, value, subgroup_num,
max(if(value = 'open', recorded_at, null)) over order_group as open_time
from order_groups
window order_group as (partition by customer_id, group_num)
)
pivot (any_value(recorded_at) for value in ('order_placed', 'order_fulfilled'))
where not open_time || order_placed is null
if applied to sample data in your question - output is
with data as (
select *, sum(case when value = 'open' then 1 end) over (partition by customer_id) as grp
from T
)
select customer_id,
min(case when value = 'open' then recorded_at end) as open_time,
...
from data
group by customer_id, grp
I am very new to sql and query writing and after alot of trying, I am asking for help.
As shown in the picture, I want to create partition of data based on is_late = 1 and show its count (that is 2) but at the same time want to capture the value of last_status where is_late = 0 to be displayed in the single row.
The task is to calculate how many time the rider was late and time taken by him from first occurrence of estimated time to the last_status.
Desired output:
You can use following query
SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
last_status,
task_count,
CONVERT(VARCHAR(5), DATEADD(MINUTE, DATEDIFF(MINUTE, expected_time_to_arrive, last_status), 0), 114) AS time_delayed
FROM
(SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
SUM(CASE WHEN is_late = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY rider_id ORDER BY rider_id) AS task_count,
ROW_NUMBER() OVER(PARTITION BY rider_id ORDER BY rider_id) AS num,
MAX(last_status) OVER(PARTITION BY rider_id ORDER BY rider_id) AS last_status
FROM myTestTable) t
WHERE num = 1
db<>fiddle
I haven't used SQL in quite a while, so I'm a bit lost here. I wanted to check for rows with duplicate values in the "Duration" and "date" columns to remove them from the query results. I would need to keep the rows where column = "Transfer" since these hold more information about the call and how it was routed through our system.
I want to use this for a dashboard, which would include counting the total number of calls from that query, which is why I cannot have both.
Here's the (Simplified) code used:
SELECT status, user, duration, phonenumber, date
FROM (SELECT * FROM view_InboundPhoneCalls) as Phonecalls
WHERE date>=DATEADD(dd, -15, getdate())
--GROUP BY duration
Which gives something of the sort:
Status
User
Duration
phonenumber
date
Received
Receptionnist
00:34:03
from: +1234567890
2021-09-30 16:01:57
Received
Receptionnist
00:03:12
from: +9876543210
2021-09-30 16:02:40
Transfer
User1
00:05:12
+14161654965;Receptionnist;User1
2021-09-30 16:01:57
Received
Receptionnist
00:05:12
from: +14161654965
2021-09-30 16:01:57
The end result would be something like this:
Status
User
Duration
phonenumber
date
Received
Receptionnist
00:34:03
from: +1234567890
2021-09-30 16:01:57
Received
Receptionnist
00:03:12
from: +9876543210
2021-09-30 16:02:40
Transfer
Receptionnist
00:05:12
+14161654965;Receptionnist;User1
2021-09-30 16:01:57
The normal "trick" is to detect duplicates first. One of the easier ways is a CTE (Common Table Expression) along with the ROW_NUMBER() function.
Part One - Mark the duplicates
WITH
cte_Sorted_List
(
status, usertype, duration, phonenumber, dated, duplicate_check
)
AS
( -- only use required fields to speed up
SELECT status, user, duration, phonenumber, date,
-- marks depend on correct columns!
Row_Number() OVER
( -- sort over relevant columns to show
PARTITION BY user, phonenumber, date, duration
-- with correct sort order
-- bit of hack: As T comes after R
-- logic: mark records to show as row number 1 in duplicate list
ORDER BY status DESC
) AS duplicate_check
FROM view_InboundPhoneCalls
-- and lose all unnecessary data
WHERE date>=DATEADD(dd, -15, getdate())
)
Part two - show relevant rows
SELECT
status, usertype, duration, phonenumber, dated
FROM
cte_Sorted_List
WHERE
Duplicate_Check = 1
;
First CTE extracts required fields in single pass, then that data only is used for output.
You could go for a blacklist, say with a CTE, then filter out the undesired rows.
Something like:
WITH Blacklist ([date], [duration]) AS (
SELECT [date], [duration] FROM view_InboundPhoneCalls
GROUP BY [date], [duration]
Having count(*) > 1
)
SELECT status, user, duration, phonenumber, date
FROM
(SELECT * FROM view_InboundPhoneCalls) as Phonecalls
LEFT JOIN
Blacklist
ON Phonecalls.[date] = Blacklist.[date]
AND Phonecalls.[duration] = Blacklist.[duration]
Where
Blacklist.[date] is null
Or
(Blacklist.[date] is not null AND Phonecalls.[Status] == 'Transfer')
You can use row-numbering for this, along with a custom ordering. There is no need for any joins.
SELECT status, [user], duration, phonenumber, date
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY duration, date
ORDER BY CASE WHEN Status = 'Transfer' THEN 1 ELSE 2 END)
FROM view_InboundPhoneCalls
WHERE date >= DATEADD(day, -15, getdate())
) as Phonecalls
WHERE rn = 1
I'm setting up a time series with each row = 1 hr.
The input data has sometimes multiple values per hour. This can vary.
Right now the specific code looks like this:
select
patientunitstayid
, generate_series(ceil(min(nursingchartoffset)/60.0),
ceil(max(nursingchartoffset)/60.0)) as hr
, avg(case when nibp_systolic >= 1 and nibp_systolic <= 250 then
nibp_systolic else null end) as nibp_systolic_avg
from nc
group by patientunitstayid
order by patientunitstayid asc;
and generates this data:
It takes the average of the entire time series for each patient instead of taking it for each hour. How can I fix this?
I'm expecting something like this:
select nc.patientunitstayid, gs.hr,
avg(case when nc.nibp_systolic >= 1 and nc.nibp_systolic <= 250
then nibp_systolic
end) as nibp_systolic_avg
from (select nc.*,
min(nursingchartoffset) over (partition by patientunitstayid) as min_nursingchartoffset,
max(nursingchartoffset) over (partition by patientunitstayid) as max_nursingchartoffset
from nc
) nc cross join lateral
generate_series(ceil(min_nursingchartoffset/60.0),
ceil(max_nursingchartoffset/60.0)
) as gs(hr)
group by nc.patientunitstayid, hr
order by nc.patientunitstayid asc, hr asc;
That is, you need to be aggregating by hr. I put this into the from clause, to highlight that this generates rows. If you are using an older version of Postgres, then you might not have lateral joins. If so, just use a subquery in the from clause.
EDIT:
You can also try:
from (select nc.*,
generate_series(ceil(min(nursingchartoffset) over (partition by patientunitstayid) / 60.0),
ceil(max(nursingchartoffset) over (partition by patientunitstayid)/ 60.0)
) hr
from nc
) nc
And adjust the references to hr in the outer query.
I have an impression event table that has a bunch of timestamps and marked start/end boundaries. I am trying to roll it up to have a metric that says "this session contains at least 1 impression with feature x". I'm not sure how exactly to do this. Any help would be appreciated. Thanks.
I want to roll this up into something that looks like:
account, session_start, session_end, interacted_with_feature
3004514, 2018-02-23 13:43:35.475, 2018-02-23 13:43:47.377, FALSE
where it is simple for me to say if this session had any interactions with the feature or not.
Perhaps aggregation does what you want:
select account, min(timestamp), max(timestamp), max(interacted_with_feature)
from t
group by account;
I was able to solve this with conditional cumulative sums to generate a session group ID for each row.
with cte as (
select *
, sum(case when session_boundary = 'start' then 1 else 0 end)
over (partition by account order by timestamp rows unbounded preceding)
as session_num
from raw_sessions
)
select account
, session_num
, min(timestamp) as session_start
, max(timestamp) as session_end
, bool_or(interacted_with_feature) as interacted_with_feature
from cte
group by account, session_num