SQL finding no activity between dates - sql

I am trying to find how many days that the company from EmployeeActivity Table using Postgres did not have any activity of an joining an employee or cutting employees. Null refer to they who still do activity inside the company meanwhile DateLeave refer to them leaving the company or not working anymore.
DateJoined DateLeave Name
................................
2012-06-20 NULL Terrence
2012-06-21 2013-06-23 Mady
2010-06-20 2012-06-24 Greg
2013-06-20 NULL Matt
my trials for this was
select EXTRACT(DAY FROM MAX(EmployeeActivity.DateJoined) - MIN(EmployeeActivity.DateLeave)
From EmployeeActivity
WHERE EmployeeActivity.DateLeave IS 'NULL'
However it shows wrong value, especially for longer table
Output Expectation:
My expectation for this output is to query the longest period of days that the company have no activity in assigning or firing Employee.

If I've understood correctly, the following should meet your needs:
SELECT
ActivityDate,
lag(ActivityDate) over (ORDER BY ActivityDate) as PreviousActivityDate,
Date_Part('day',ActivityDate - lag(ActivityDate) over (ORDER BY ActivityDate)) as Difference
FROM
(
select DateJoined as ActivityDate from EmployeeActivity
union
select coalesce(DateLeave,now()) from EmployeeActivity
) AllActivityDates
ORDER BY Difference DESC
LIMIT 1 OFFSET 1
The reason for the OFFSET 1 is because the earliest DateJoined doesn't have a previous row, and that one comes to the top, we're just skipping it.

Related

Calculate total working hours of employee based swipe in/ swipe out using oracel sql

I was recently given a task to calculate an employee's total office hours based on his card swipe in/swipe out. I have the following data :
id gate_1 gate_2 gate_3 gate_4
100 null null null 9:00
100 null 13:30 null null
100 null null 16:00 null
100 null null 18:00 null
Image
Here, the employee 100 comes in via gate_4 at 9 am and takes a break at 13:30 and goes out using gate_2. Then he comes back at 16:00 using gate_3 and leave office at 18:00 using gate_3. So, how to calculate the total in office timing using this data ?
Thanks in advance.
As has been pointed out your data model is denormalized to not even satisfy 1st normal form. The first step is to correct that (doing so in a query). Then there is no indication as to swipe in or swipe out, therefore it must be assumed that the first swipe time is always in and the ins/outs always alternate properly. Finally there is no indication of multiple days being covered so the assumption is just 1 period. That is a lot of assumptions.
Since an Oracle data type date contains time as well as the date and summing differences is much easier than with timestamps I convert timestamp to date in the first step of normalizing the data. Given all this we arrive at: (See Demo)
with normal (emp_id, inout_tm) as
( select emp_id, cast(gate1 as date)
from emp_gate_time
where gate1 is not null
union all
select emp_id, cast(gate2 as date)
from emp_gate_time
where gate2 is not null
union all
select emp_id, cast(gate3 as date)
from emp_gate_time
where gate3 is not null
union all
select emp_id, cast(gate4 as date)
from emp_gate_time
where gate4 is not null
)
select emp_id, round(24.0*(sum(hours)),1) hours_in_office
from ( select emp_id,(time_out - time_in) hours
from ( select emp_id, inout_tm time_in, rn
, lead(inout_tm) over(partition by emp_id order by inout_tm) time_out
from ( select n.*
, row_number() over(partition by emp_id order by inout_tm) rn
from normal n
)
)
where mod(rn,2) = 1
)
group by emp_id;
Items of Interest:
Subquery Factoring (CTE)
Date Arithmatic - in Hours ...Difference Between Dates in hours ...
Oracle Analytic Functions - Row_number, lead
You have a denormalized structure of your db scheme. You have fields as gate_1, gate_2 and etc. It's wrong way. The better way is following, you should have reference table of gates, for example like this
id|gate_name
--|---------
And your table with data for employee will be looks like this.
id_employee|id_gate|time
Then you can sort data in this table, and then count period of time between two consecutive rows.

Query to select appropriate row and calculate elapsed time

I need some help in coming up with a query that will return the answer to the question “How long has a Help Desk Ticket been owned by the currently assigned group?” Following is a subset of the data model with some sample data:
Help Desk Cases
Case ID (PK) Assigned Person Assigned Group
123456 Robert Hardware
Help Desk Case Assignment History
Case ID (PK) Seq # (PK) Assigned Group Assigned Person Elapsed Time Row Added Date/Time
123456 1 Hardware 10
123456 2 Software 2
123456 3 Hardware Sam 1
123456 4 Software Sophie 6
123456 5 Hardware 8
123456 6 Hardware Sam 3
123456 7 Hardware Robert
The Elapsed Time column for the most recent row (Seq #7) is not updated until a subsequent row (Seq #8) is written, so I don’t think I can use an aggregate function. For the sample data above, I need to get the Row Added column from Seq # 5 and subtract it from the current date to get the total amount of time the case has been most recently assigned to the Hardware group (we ignore previous assignments such as Seq # 1 and Seq # 3).
The Query output for the example above should be:
Case ID Assigned Group Assigned Person Time Owned
123456 Hardware Robert Current Date - Seq #5 Row Added Date/Time
With Oracle 12c and higher...
select case_id,
last_assigned_group as assigned_group,
last_assigned_person as assigned_person,
nvl(last_row_added, systimestamp) - first_row_added as time_owned
from help_desk_case_assignment_history
match_recognize (
partition by case_id
order by seq#
measures
first(row_added) as first_row_added,
last(row_added) as last_row_added,
last(assigned_group) as last_assigned_group,
last(assigned_person) as last_assigned_person
one row per match
after match skip past last row
pattern (
assignment_run* case_end
)
define
assignment_run as (assigned_group = next(assigned_group)),
case_end as (elapsed_time is null or next(assigned_group) is null)
)
;
In human words: Per each helpdesk case ID find the last uninterrupted "run" of assignments within the same group. For the last "run" of assignments identify its starting time, ending time, and ending person. And display the found values.
With Oracle 11g and lower...
with xyz as (
select X.*,
case when lnnvl(assigned_group = lag(assigned_group) over (partition by case_id order by seq#)) then seq# end as assignment_run_start
from help_desk_case_assignment_history X
),
xyz2 as (
select X.*,
last_value(assignment_run_start) ignore nulls over (partition by case_id order by seq#) as assignment_run_id
from xyz X
),
xyz3 as (
select case_id, assigned_group, assignment_run_id,
max(assigned_person) keep (dense_rank last order by seq#) as last_assigned_person,
nvl(max(row_added) keep (dense_rank last order by seq#), systimestamp)
- min(row_added) keep (dense_rank first order by seq#)
as time_owned,
row_number() over (partition by case_id order by assignment_run_id desc) as last_group_ind
from xyz2 X
group by case_id, assigned_group, assignment_run_id
)
select case_id, assigned_group, last_assigned_person as assigned_person, time_owned
from xyz3
where last_group_ind = 1
;
Perhaps ugly, but pretty straightforward and working.
In human words:
Identify the boundaries (starts) of assignment runs as increasing numeric IDs.
Extend the found assignment run starts to the whole assignment runs.
Calculate the assignments' run times and last assigned persons.
Restrict the previous calculation to the last (by their ID) assignment run only.

How to check if dates overlap on different lines in SQL Server?

I have a database with electricity meter readings. Sometimes people get a new meter and then their original meter gets an end date and the new meter gets a start date and the end date remains NULL. This can happen multiple times in a year and I want to know if there are no gaps in measurement. In other words, I need to figure out if end date 1 is the same as start date 2 and so on.
Sample data:
cust_id meter_id start_date end_date
--------------------------------------------------
a 1 2017-01-01 2017-05-02
a 2 2017-05-02 Null
b 3 2017-01-01 2017-06-01
b 4 2017-06-05 Null
This is what the data looks like and the result I am looking for is that for customer a the end date of meter 1 is equal to the start date of meter 2. For customer b however, there are 4 days between the end date of meter 3 and the start date of meter 4. That is something I want to flag.
I found customers for whom this can happen up to 8 times in the period I am researching. I tried something with nested queries and very complex cases but even I lost my way around it, so I was wondering if someone here has an idea of how to get to the answer a little smarter.
You can get the offending rows using lag():
select r.*
from (select r.*,
lag(end_date) over (partition by cust_id, meter_id order by start_date) as prev_end_date,
row_number() over (partition by cust_id, meter_id order by start_date) as seqnum
from readings r
) r
where prev_end_date <> start_date or prev_end_date is null and seqnum > 1;
Guessing there is now a better way to pull this off using LEAD and LAG, but I wrote an article in SQL 2008R2 called T-SQL: Identify bad dates in a time series where you can modify the big cte in the middle of the article to handle your definition of a bad date.
Good luck. There's too much detail in the article to post in a single SO question, otherwise I'd do that here.

count occurrences for each week using db2

I am looking for some general advice rather than a solution. My problem is that I have a list of dates per person where due to administrative procedures, a person may have multiple records stored for this one instance, yet the date recorded is when the data was entered in as this person is passed through the paper trail. I understand this is quite difficult to explain so I'll give an example:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2000-01-01 B
1 2000-01-02 C
1 2003-04-01 A
1 2003-04-03 A
where I want to know how many valid records a person has by removing annoying audits that have recorded the date as the day the data was entered, rather than the date the person first arrives in the dataset. So for the above person I am only interested in:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2003-04-01 A
what makes this problem difficult is that I do not have the luxury of an audit column (the audit column here is just to present how to data is collected). I merely have dates. So one way where I could crudely count real events (and remove repeat audit data) is to look at individual weeks within a persons' history and if a record(s) exists for a given week, add 1 to my counter. This way even though there are multiple records split over a few days, I am only counting the succession of dates as one record (which after all I am counting by date).
So does anyone know of any db2 functions that could help me solve this problem?
If you can live with standard weeks it's pretty simple:
select
person, year(dt), week(dt), min(dt), min(audit)
from
blah
group by
person, year(dt), week(dt)
If you need seven-day ranges starting with the first date you'd need to generate your own week numbers, a calendar of sorts, e.g. like so:
with minmax(mindt, maxdt) as ( -- date range of the "calendar"
select min(dt), max(dt)
from blah
),
cal(dt,i) as ( -- fill the range with every date, count days
select mindt, 0
from minmax
union all
select dt+1 day , i+1
from cal
where dt < (select maxdt from minmax) and i < 100000
)
select
person, year(blah.dt), wk, min(blah.dt), min(audit)
from
(select dt, int(i/7)+1 as wk from cal) t -- generate week numbers
inner join
blah
on t.dt = blah.dt
group by person, year(blah.dt), wk

Getting repeated rows for where with or condition

I am trying find employees that worked during a specific time period and the hours they worked during that time period. My query has to join the employee table that has employee id as pk and uses effective_date and expiration_date as time measures for the employee's position to the timekeeping table that has a pay period id number as pk and also uses effective and expiration dates.
The problem with the expiration date in the employee table is that if the employee is currently employed then the date is '12/31/9999'. I am looking for employees that worked in a certain year and current employees as well as the hours they worked separated by pay periods.
When I take this condition in account in the where with an OR statement, I get duplicates that is employees that have worked the time period I am looking for and beyond as well as duplicate records for the '12/31/9999' and the valid employee in that time period.
This is the query I am using:
SELECT
J.EMPL_ID
,J.DEPT
,J.UNIT
,J.LAST_NM
,J.FIRST_NM
,J.TITLE
,J.EFF_DT
,J.EXP_DT
,TM1.PPRD_ID
,TM1.EMPL_ID
,TM1.EXP_DT
,TM1.EFF_DT
--PULLING IN THE DAILY HRS WORKED
,(SELECT NVL(SUM(((to_number(SUBSTR(TI.DAY_1, 1
,INSTR(TI.DAY_1, ':', 1, 1)-1),99))*60)+
(TO_NUMBER(SUBSTR(TI.DAY_1
,INSTR(TI.DAY_1,':', -1, 1)+1),99))),0)
FROM PPRD_LINE TI
WHERE
TI.PPRD_ID=TM1.PPRD_ID
) "DAY1"
---AND THE REST OF THE DAYS FOR THE WORK PERIOD
FROM PPRD_LINE TM1
JOIN EMPL J ON TM1.EMPL_ID=J.EMPL_ID
WHERE
J.EMPL_ID='some id number' --for test purposes, will need to break down to depts-
AND
J.EFF_DT >=TO_DATE('1/1/2012','MM/DD/YYYY')
AND
(
J.EXP_DT<=TO_DATE('12/31/2012','MM/DD/YYYY')
OR
J.EXP_DT=TO_DATE('12/31/9999','MM/DD/YYYY') --I think the problem might be here???
)
GROUP BY
J.EMPL_ID
,J.DEPT
,J.UNIT
,J.LAST_NM
,J.FIRST_NM
,J.TITLE
,J.EFF_DT
,J.EXP_DT
,TM1.PPRD_ID
,TM1.EMPL_ID
,TM1.DOC_ID
,TM1.EXP_DT
,TM1.EFF_DT
ORDER BY
J.EFF_DT
,TM1.EFF_DT
,TM1.EXP_DT
I'm pretty sure I'm missing something simple but at this point I can't see the forest for the trees. Can anyone out there point me in the right direction?
an example of the duplicate records:
for employee 1 for the year of 2012:
Empl_ID Dept Unit Last First Title Eff Date Exp Date PPRD ID Empl_ID
00001 04 012 Babbage Charles Somejob 4/1/2012 10/15/2012 0407123 00001
Exp Date_1 Eff Date_1
4/15/2012 4/1/2012
this record repeats 3 times and goes past the pay periods in 2012 to the current pay period in 2013
the subquery I use to convert time to be able to add hrs and mins together to compare down the line.
I'm going to take a wild guess and see if this is what you want, remember I could not test so there may be typos.
If this is and especially if it is not, you should read in the FAQ about how to ask good questions. If this is what you were trying to understand your question should have been answered within about 10 mins. Because it was not clear what you were asking no one could answer your question.
You should include inputs and outputs and EXPECTED output in your question. The data you gave was not the output of the select statement (it did not have the DAY1 column).
SELECT
J.EMPL_ID
,J.DEPT
,J.UNIT
,J.LAST_NM
,J.FIRST_NM
,J.TITLE
,J.EFF_DT
,J.EXP_DT
,TM1.PPRD_ID
,TM1.EMPL_ID
-- ,TM1.EXP_DT Can't have these if you are summing accross multiple records.
-- ,TM1.EFF_DT
--PULLING IN THE DAILY HRS WORKED
,NVL(SUM(((to_number(SUBSTR(TM1.DAY_1, 1,INSTR(TM1.DAY_1, ':', 1, 1)-1),99))*60)+
(TO_NUMBER(SUBSTR(TM1.DAY_1,INSTR(TM1.DAY_1,':', -1, 1)+1),99))),0)
"DAY1"
---AND THE REST OF THE DAYS FOR THE WORK PERIOD
FROM PPRD_LINE TM1
JOIN EMPL J ON TM1.EMPL_ID=J.EMPL_ID
WHERE
J.EMPL_ID='some id number' --for test purposes, will need to break down to depts-
AND J.EFF_DT >=TO_DATE('1/1/2012','MM/DD/YYYY')
AND(J.EXP_DT<=TO_DATE('12/31/2012','MM/DD/YYYY') OR J.EXP_DT=TO_DATE('12/31/9999','MM/DD/YYYY'))
GROUP BY
J.EMPL_ID
,J.DEPT
,J.UNIT
,J.LAST_NM
,J.FIRST_NM
,J.TITLE
,TM1.PPRD_ID
,TM1.EMPL_ID
,TM1.DOC_ID
ORDER BY
MIN(J.EFF_DT)
,MAX(TM1.EFF_DT)
,MAX(TM1.EXP_DT)