Pivot DateTime fields by date - sql

I have a table that contains employee 'punches' (clock ins/outs) each punch can be an 'in'(punch_type=1) or an 'out' (punch_type=2).
The table is formatted as follows:
emp_num | report_date | punch_time | punch_type
-----------------------------------------------------------
1 | 2018-04-20 |2018-04-20 04:46:00.000 | 1
1 | 2018-04-20 |2018-04-20 06:58:00.000 | 2
1 | 2018-04-20 |2018-04-20 08:10:00.000 | 1
1 | 2018-04-20 |2018-04-20 12:00:00.000 | 2
I am trying to get the first 'punch' (clock in) and the following 'punch' (clock out) in the same row. Then, of course, any following would be the same.
Desired output:
emp_num | report_date | punch_in | punch_out
-----------------------------------------------------------
1 | 2018-04-20 |2018-04-20 04:46:00.000 | 2018-04-20 06:58:00.000
1 | 2018-04-20 |2018-04-20 08:10:00.000 | 2018-04-20 12:00:00.000
Keep in mind there may be multiple punch in/out combos in one day as shown in the example.
Any help would be greatly appreciated!

First you want to know which punch out time belongs to which punch in time. Answer: the nth punch out time belongs to the nth punch in time. So number your records:
select
p_in.emp_num,
p_in.report_date,
p_in.punch_time as punch_in,
p_out.punch_time as punch_out
from
(
select
emp_num,
report_date,
punch_time,
row_number() over (partition by emp_num, report_date order by punch_time) as rn
from mytable
where punch_type = 1
) p_in
left join
(
select
emp_num,
report_date,
punch_time,
row_number() over (partition by emp_num, report_date order by punch_time) as rn
from mytable
where punch_type = 2
) p_out on p_out.emp_num = p_in.emp_num
and p_out.report_date = p_in.report_date
and p_out.rn = p_in.rn
order by p_in.emp_num, p_in.report_date, punch_in;

select emp_num, report_date, max(case when punch_type=1 then punch_time else null end) punch_in,
max(case when punch_type=2 then punch_time else null end) punch_out
from (select *, row_number() over(partition by emp_num, report_date, punch_type order by emp_num, report_date, punch_time) value from yourtable )a
group by emp_num, report_date, value

Related

Oracle Pivot Help based on Data

I am trying use a oracle pivot function to display the data in below format. I have tried to use examples I found stackoverflow, but I am unable to achieve what I am looking.
With t as
(
select 1335 as emp_id, 'ADD Insurance New' as suuid, sysdate- 10 as startdate, null as enddate from dual
union all
select 1335 as emp_id, 'HS' as suuid, sysdate- 30 as startdate, null as enddate from dual
union all
select 1335 as emp_id, 'ADD Ins' as suuid, sysdate- 30 as startdate, Sysdate - 10 as enddate from dual
)
select * from t
output:
+--------+-------------------+-------------------+---------+-------------------+
| EMP_ID | SUUID_1 | SUUID_1_STARTDATE | SUUID_2 | SUUID_2_STARTDATE |
+--------+-------------------+-------------------+---------+-------------------+
| 1335 | ADD Insurance New | 10/5/2020 15:52 | HS | 9/15/2020 15:52 |
+--------+-------------------+-------------------+---------+-------------------+
Can anyone suggest to how to use SQL Pivot to get this format?
You can use conditional aggregation. There is more than one way to understand your question, but one approach that would work for your sample data is:
select emp_id,
max(case when rn = 1 then suuid end) suuid_1,
max(case when rn = 1 then startdate end) suid_1_startdate,
max(case when rn = 2 then suuid end) suuid_2,
max(case when rn = 2 then startdate end) suid_2_startdate
from (
select t.*, row_number() over(partition by emp_id order by startdate desc) rn
from t
where enddate is null
) t
group by emp_id
Demo on DB Fiddle:
EMP_ID | SUUID_1 | SUID_1_STARTDATE | SUUID_2 | SUID_2_STARTDATE
-----: | :---------------- | :--------------- | :------ | :---------------
1335 | ADD Insurance New | 05-OCT-20 | HS | 15-SEP-20
You can do it with PIVOT:
With t ( emp_id, suuid, startdate, enddate ) as
(
select 1335, 'ADD Insurance New', sysdate- 10, null from dual union all
select 1335, 'HS', sysdate- 30, null from dual union all
select 1335, 'ADD Ins', sysdate- 30, Sysdate - 10 from dual
)
SELECT emp_id,
"1_SUUID" AS suuid1,
"1_STARTDATE" AS suuid_startdate1,
"2_SUUID" AS suuid2,
"2_STARTDATE" AS suuid_startdate2
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( ORDER BY startdate DESC, enddate DESC NULLS FIRST )
AS rn
FROM t
)
PIVOT (
MAX( suuid ) AS suuid,
MAX( startdate ) AS startdate,
MAX( enddate ) AS enddate
FOR rn IN ( 1, 2 )
)
Outputs:
EMP_ID | SUUID1 | SUUID_STARTDATE1 | SUUID2 | SUUID_STARTDATE2
-----: | :---------------- | :--------------- | :----- | :---------------
1335 | ADD Insurance New | 05-OCT-20 | HS | 15-SEP-20
db<>fiddle here

SQL- Return rows after nth occurrence of event per user

I'm using postgreSQL 8.0 and I have a table with user_id, timestamp, and event_id.
How can I return the rows (or row) after the 4th occurrence of event_id = someID per user?
|---------------------|--------------------|------------------|
| user_id | timestamp | event_id |
|---------------------|--------------------|------------------|
| 1 | 2020-04-02 12:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 13:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 14:00 | 99 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 15:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 16:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 17:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 17:00 | 11 |
|---------------------|--------------------|------------------|
Ie if event_id = 11, I would only want the last row in the table above.
You can use window functions:
select *
from (
select t.*, row_number() over(partition by user_id, event_id order by timestamp) rn
from mytable t
) t
where rn > 4
Here is a little trick that removes the row number from the result:
select (t).*
from (
select t, row_number() over(partition by user_id, event_id order by timestamp) rn
from mytable t
) x
where rn > 4
You can use a cumulative count. This version includes the 4th occurrence:
select t.*
from (select t.*,
count(*) filter (where event_id = 11) over (partition by user_id order by timestamp) as event_11_cnt
from t
) t
where event_11_cnt >= 4;
The filter has been valid Postgres syntax for a long time, but instead, you can use:
select t.*
from (select t.*,
sum( (event_id = 11)::int ) over (partition by user_id order by timestamp) as event_11_cnt
from t
) t
where event_11_cnt >= 4;
This version does not:
where event_11_cnt > 4 or (event_11_cnt = 4 and event_id <> 11)
An alternative method:
select t.*
from t
where t.timestamp > (select t2.timestamp
from t t2
where t2.user_id = t.user_id and
t2.event_id = 11
order by t2.timestamp
limit 1 offset 3
);
sorry to be asking about such an old version of Postgres, here is an answer that worked:
WITH EventOrdered AS(
SELECT
EventTypeId
, UserId
, Timestamp
, ROW_NUMBER() OVER (PARTITION BY EventTypeId, UserId ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ROW_NO
FROM Event),
FourthEvent AS (
SELECT DISTINCT
UserID
, FIRST_VALUE(TimeStamp) OVER (PARTITION BY UserId ORDER BY Timestamp) FirstFourthEventTimestamp
FROM EventOrdered
WHERE ROW_NO = 4)
SELECT e.*
FROM Event e
JOIN FourthEvent ffe
ON e.UserId = ffe.UserId
AND e.Timestamp > ffe.FirstFourthEventTimestamp
ORDER BY e.UserId, e.Timestamp

SQL: Get date difference between rows in the same column [duplicate]

This question already has an answer here:
SQL or LINQ: how do I select records where only one paramater changes?
(1 answer)
Closed 3 years ago.
I am trying to create a report and this is my input data.
Stage Name Date
1 x 12/05/2019 10:00:03
1 x 12/05/2019 10:05:01
1 y 12/06/2019 12:00:07
2 x 12/06/2019 13:12:03
2 x 12/06/2019 13:23:00
1 y 12/08/2019 16:00:07
2 x 12/09/2019 09:17:59
This is my desired output.
Stage Name DateFrom DateTo DateDiff
1 x 12/05/2019 10:00:03 12/06/2019 12:00:07 1
1 y 12/06/2019 12:00:07 12/06/2019 13:12:03 0
2 x 12/06/2019 13:12:03 12/08/2019 16:00:07 2
1 y 12/08/2019 16:00:07 12/09/2019 09:17:59 1
I cannot use group by clause over stage and name, since it will group the 3rd and 6th rows from my input. I tried joining the table to itself, but I am not getting the desired result. Is this even possible in SQL ? Any ideas would be helpful. I am using Microsoft SQL Server.
This is a variation of the gaps and island problem. You want to group together groups of adjacent rows (ie having the same stage and name); but you want to use the start date of the next group as ending date for the current group.
Here is one way to do it:
select
stage,
name,
min(date) date_from,
lead(min(date)) over(order by min(date)) date_to,
datediff(day, min(date), lead(min(date)) over(order by min(date))) date_diff
from (
select
t.*,
row_number() over(order by date) rn1,
row_number() over(partition by stage, name order by date) rn2
from mytable t
) t
group by stage, name, rn1 - rn2
order by date_from
Demo on DB Fiddle:
stage | name | date_from | date_to | datediff
----: | :--- | :------------------ | :------------------ | -------:
1 | x | 12/05/2019 10:00:03 | 12/06/2019 12:00:07 | 1
1 | y | 12/06/2019 12:00:07 | 12/06/2019 13:12:03 | 0
2 | x | 12/06/2019 13:12:03 | 12/08/2019 16:00:07 | 2
1 | y | 12/08/2019 16:00:07 | 12/09/2019 09:17:59 | 1
2 | x | 12/09/2019 09:17:59 | null | null
Note that this does not produce exactly the result that you showed: there is an additional, pending record at the end of the resultset, that represents the "on-going" series of records. If needed, you can filter it out by nesting the query:
select *
from (
select
stage,
name,
min(date) date_from,
lead(min(date)) over(order by min(date)) date_to,
datediff(day, min(date), lead(min(date)) over(order by min(date))) date_diff
from (
select
t.*,
row_number() over(order by date) rn1,
row_number() over(partition by stage, name order by date) rn2
from mytable t
) t
group by stage, name, rn1 - rn2
) t
where date_to is not null
order by date_from
This is a variation of the gaps-and-islands problem, but it has a pretty simple solution.
Just keep every row where the previous row has a different stage or name. Then use lead() to get the next date. Here is the basic idea:
select t.stage, t.name, t.date as datefrom
lead(t.date) over (order by t.date) as dateto,
datediff(day, t.date, lead(t.date) over (order by t.date)) as diff
from (select t.*,
lag(date) over (partition by stage, name order by date) as prev_sn_date,
lag(date) over (order by date) as prev_date
from t
) t
where prev_sn_date <> prev_date or prev_sn_date is null;
If you really want to filter out the last row, you need one more step; I'm not sure if that is desirable.

SQL: FIlter rows by direction

I have a table with 2 column date (timestamp), status (boolean).
I have a lot of value like:
| date | status |
|-------------------------- |-------- |
| 2018-11-05T19:04:21.125Z | true |
| 2018-11-05T19:04:22.125Z | true |
| 2018-11-05T19:04:23.125Z | true |
....
I need to get a result like this:
| date_from | date_to | status |
|-------------------------- |-------------------------- |-------- |
| 2018-11-05T19:04:21.125Z | 2018-11-05T19:04:27.125Z | true |
| 2018-11-05T19:04:27.125Z | 2018-11-05T19:04:47.125Z | false |
| 2018-11-05T19:04:47.125Z | 2018-11-05T19:04:57.125Z | true |
So, I need to filter all "same" value and get in return only period of status true/false.
I create query like this:
SELECT max("current_date"), current_status, previous_status
FROM (SELECT date as "current_date",
status as current_status,
(lag(status, 1) OVER (ORDER BY msgtime))::boolean AS previous_status
FROM "table" as table
) as raw_data
group by current_status, previous_status
but in response I get only no more than 4 value
This is a gaps-and-islands problem. A typical method uses the difference of row numbers:
select min(date), max(date), status
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by status order by date) as seqnum_s
from t
) t
group by status, (seqnum - seqnum_s);
Yes you could use LAG but then you also need a running counter that increments every time the status changes:
WITH cte1 AS (
SELECT date, status, CASE WHEN LAG(status) OVER (ORDER BY date) = status THEN 0 ELSE 1 END AS chg
FROM yourdata
), cte2 AS (
SELECT date, status, SUM(chg) OVER (ORDER BY date) AS grp
FROM cte1
)
SELECT MIN(date) AS date_from, MAX(date) AS date_to, status
FROM cte2
GROUP BY grp, status
ORDER BY date_from
DB Fiddle

Count and pivot a table by date

I would like to identify the returning customers from an Oracle(11g) table like this:
CustID | Date
-------|----------
XC321 | 2016-04-28
AV626 | 2016-05-18
DX970 | 2016-06-23
XC321 | 2016-05-28
XC321 | 2016-06-02
So I can see which customers returned within various windows, for example within 10, 20, 30, 40 or 50 days. For example:
CustID | 10_day | 20_day | 30_day | 40_day | 50_day
-------|--------|--------|--------|--------|--------
XC321 | | | 1 | |
XC321 | | | | 1 |
I would even accept a result like this:
CustID | Date | days_from_last_visit
-------|------------|---------------------
XC321 | 2016-05-28 | 30
XC321 | 2016-06-02 | 5
I guess it would use a partition by windowing clause with unbounded following and preceding clauses... but I cannot find any suitable examples.
Any ideas...?
Thanks
No need for window functions here, you can simply do it with conditional aggregation using CASE EXPRESSION :
SELECT t.custID,
COUNT(CASE WHEN (last_visit- t.date) <= 10 THEN 1 END) as 10_day,
COUNT(CASE WHEN (last_visit- t.date) between 11 and 20 THEN 1 END) as 20_day,
COUNT(CASE WHEN (last_visit- t.date) between 21 and 30 THEN 1 END) as 30_day,
.....
FROM (SELECT s.custID,
LEAD(s.date) OVER(PARTITION BY s.custID ORDER BY s.date DESC) as last_visit
FROM YourTable s) t
GROUP BY t.custID
Oracle Setup:
CREATE TABLE customers ( CustID, Activity_Date ) AS
SELECT 'XC321', DATE '2016-04-28' FROM DUAL UNION ALL
SELECT 'AV626', DATE '2016-05-18' FROM DUAL UNION ALL
SELECT 'DX970', DATE '2016-06-23' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-05-28' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-06-02' FROM DUAL;
Query:
SELECT *
FROM (
SELECT CustID,
Activity_Date AS First_Date,
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '10' DAY FOLLOWING )
- 1 AS "10_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '20' DAY FOLLOWING )
- 1 AS "20_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '30' DAY FOLLOWING )
- 1 AS "30_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '40' DAY FOLLOWING )
- 1 AS "40_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '50' DAY FOLLOWING )
- 1 AS "50_Day",
ROW_NUMBER() OVER ( PARTITION BY CustID ORDER BY Activity_Date ) AS rn
FROM Customers
)
WHERE rn = 1;
Output
USTID FIRST_DATE 10_Day 20_Day 30_Day 40_Day 50_Day RN
------ ------------------- ---------- ---------- ---------- ---------- ---------- ----------
AV626 2016-05-18 00:00:00 0 0 0 0 0 1
DX970 2016-06-23 00:00:00 0 0 0 0 0 1
XC321 2016-04-28 00:00:00 0 0 1 2 2 1
Here is an answer that works for me, I have based it on your answers above, thanks for contributions from MT0 and Sagi:
SELECT CustID,
visit_date,
Prev_Visit ,
COUNT( CASE WHEN (Days_between_visits) <=10 THEN 1 END) AS "0-10_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 11 AND 20 THEN 1 END) AS "11-20_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 21 AND 30 THEN 1 END) AS "21-30_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 31 AND 40 THEN 1 END) AS "31-40_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 41 AND 50 THEN 1 END) AS "41-50_day" ,
COUNT( CASE WHEN (Days_between_visits) >50 THEN 1 END) AS "51+_day"
FROM
(SELECT CustID,
visit_date,
Lead(T1.visit_date) over (partition BY T1.CustID order by T1.visit_date DESC) AS Prev_visit,
visit_date - Lead(T1.visit_date) over (
partition BY T1.CustID order by T1.visit_date DESC) AS Days_between_visits
FROM T1
) T2
WHERE Days_between_visits >0
GROUP BY T2.CustID ,
T2.visit_date ,
T2.Prev_visit ,
T2.Days_between_visits;
This returns:
CUSTID | VISIT_DATE | PREV_VISIT | DAYS_BETWEEN_VISIT | 0-10_DAY | 11-20_DAY | 21-30_DAY | 31-40_DAY | 41-50_DAY | 51+DAY
XC321 | 2016-05-28 | 2016-04-28 | 30 | | | 1 | | |
XC321 | 2016-06-02 | 2016-05-28 | 5 | 1 | | | | |