Counting employees from one job level to another - sql

I have a snapshot of a dataset as follows:
effective_date hire_date name job_level direct_report
01.01.2018 01.01.2018 xyz 5 null
01.02.2018 01.01.2018 xyz 5 null
01.03.2018 01.01.2018 xyz 5 null
01.04.2018 01.01.2018 xyz 6 null
01.05.2018 01.01.2018 xyz 6 null
01.01.2018 01.02.2018 abc 5 null
01.02.2018 01.02.2018 abc 5 null
01.03.2018 01.02.2018 abc 5 null
01.04.2018 01.02.2018 abc 5 null
01.05.2018 01.02.2018 abc 5 null
Effective date is an overview of info for each employee on a daily
basis.
Hire date is the date when an employee was hired
Job level is the level at which employee stands on that particular day
I want to find out as to how many employees moved/promoted from level 5 to level 6 during this overall time?

Here is one method that uses two levels of aggregation. You can get the employees that were promoted by comparing the minimum date for "5" to the maximum date of "6":
select name
from t
where job_level in (5, 6)
group by name
having min(case where job_level = 5 then effective_date end) < max(case where job_level = 6 then effective_date end);
To count them:
select count(*)
from (select name
from t
where job_level in (5, 6)
group by name
having min(case where job_level = 5 then effective_date end) < max(case where job_level = 6 then effective_date end)
) x;
Alternatively, you can use lag():
select count(distinct name)
from (select t.*, lag(job_level) over (partition by name order by effective_date) as prev_job_level
from t
) t
where prev_job_level = 5 and job_level = 6;
The two are subtly different, but within the range of the ambiguity of the question. For instance, the first would count 5 --> 4 --> 6, the second would not.

you can try this.
select count(distinct name) from employees e1
WHERE effective_date between '01.01.2018' and '01.05.2018'
And job_level = 5
and EXISTS (select * from employees e2 where e1.name = e2.name
and e2.effective_date > e1.effective_date
and e2.job_level = 6
)

Related

Convert rows to columns in SQL in teradata

I have data which looks like this:
Name
Date
Bal
John
2022-01-01
10
John
2022-01-02
4
John
2022-01-03
7
David
2022-01-01
13
David
2022-01-02
15
David
2022-01-03
20
I want the Bal column populated under date column, like:
Name
2022-01-01
2022-01-02
2022-01-03
John
10
4
7
David
13
15
20
What I tried is
SELECT
NAME,
CASE WHEN DATE= '2022-01-01' THEN EOD_BALANCE ELSE NULL END "01-Jan-22",
CASE WHEN DATE= '2022-01-02' THEN EOD_BALANCE ELSE NULL END "02-Jan-22"
FROM TABL1
but I am not getting the required results. Below are the results from query in first answer:
You want a pivot query here, which means you should aggregate by name and then take the max of the CASE expressions:
SELECT
NAME,
MAX(CASE WHEN DATE = '2022-01-01' THEN EOD_BALANCE END) AS "01-Jan-22",
MAX(CASE WHEN DATE = '2022-01-02' THEN EOD_BALANCE END) AS "02-Jan-22",
MAX(CASE WHEN DATE = '2022-01-03' THEN EOD_BALANCE END) AS "03-Jan-22"
FROM TABL1
GROUP BY NAME;

Query to Return all Working Periods per Employee and Cost Code

I have data showing employee, cost code and WeekEnd (Sunday). WeekEnd is the last day of working week, from Monday thru Sunday:
EmpID CostCode WeekEnd
=============================
1 1 01/02/2022
1 1 01/09/2022
Employee skipped working in week of 1/16/2022 ...
1 1 01/23/2022
1 1 01/30/2022
1 2 02/06/2022
1 3 02/13/2022
1 3 02/20/2022
Need to get result like this:
EmpID CostCode FirstWeekEnd LastWeekEnd
==============================================
1 1 01/02/2022 01/09/2022
1 1 01/23/2022 01/30/2022
1 2 02/06/2022 02/06/2022
1 3 02/13/2022 02/20/2022
I need to get all periods (start - end) when employee worked per cost code?
It can be solved by calculating a ranking for sequencial weeks.
Then aggregate also on that ranking.
SELECT EmpId, CostCode
, MIN(WeekEnd) AS FirstWeekEnd
, MAX(WeekEnd) AS LastWeekEnd
FROM
(
SELECT *
, SUM(Flag) OVER (PARTITION BY EmpID ORDER BY WeekEnd) AS Rnk
FROM
(
SELECT *
, IIF(7>=DATEDIFF(day, LAG(WeekEnd) OVER (PARTITION BY EmpID ORDER BY WeekEnd), WeekEnd),0,1) AS Flag
FROM your_table
) q1
) q2
GROUP BY EmpId, CostCode, Rnk
ORDER BY EmpId, FirstWeekEnd;
EmpId
CostCode
FirstWeekEnd
LastWeekEnd
1
1
2022-01-02
2022-01-09
1
1
2022-01-23
2022-01-30
1
2
2022-02-06
2022-02-06
1
3
2022-02-13
2022-02-20
Test on db<>fiddle here

T-SQL - Select patients who are readmitted (within some duration) with the same diagnosis

I have a table with the following schema:
CREATE TABLE Codes
(
diagnosis_code CHAR,
visit_date DATE,
visit_id INT,
patient_id int
);
I would like to output the patient_ids where the patient is readmitted (so a different visit_id) with the same diagnosis_code within a certain time (say 15 days). For example, if I have the following entries in the table:
diagnosis_code visit_date visit_id patient_id
-------------- ---------- ----------- -----------
A 2018-01-01 1 1
B 2018-01-01 1 1
A 2018-01-07 2 1
C 2018-01-01 3 2
D 2018-01-01 4 3
D 2018-01-20 5 3
E 2018-01-01 6 4
E 2018-01-01 6 4
A 2018-01-07 7 1
The query would return only patient_id = 1, and the rationales are as follows:
1, because between visit_id 1 and 2, this patient shared diagnosis code A.
Not 2 because this patient was only admitted once.
Not 3 because this patient, although readmitted for the same diagnosis, was not readmitted within 15 days of their initial visit.
Not 4 because this patient has a duplicated diagnosis code in the same visit.
Notice that patient_id = 1 is readmitted for the same diagnosis during visit_id = 7, but he was already counted once before.
You could try a simple join, adding the conditions you described:
select
distinct c.patient_id
from codes c
join codes d on d.patient_id = c.patient_id
and d.visit_id <> c.visit_id
and d.diagnosis_code = c.diagnosis_code
and d.visit_date between c.visit_date
and dateadd(day, 15, c.visit_date)
i used lag.
declare #Codes table
(
diagnosis_code CHAR,
visit_date DATE,
visit_id INT,
patient_id int
);
insert into #Codes
values
('A', '2018-01-01' ,1, 1)
,('B' , '2018-01-01', 1, 1)
,('A' , '2018-01-07', 2, 1)
,('C' ,'2018-01-01', 3, 2)
/*
D 2018-01-01 4 3
D 2018-01-15 5 3
E 2018-01-01 6 4
E 2018-01-01 6 4
A 2018-01-07 7 1
*/
select *
from (
select *
--,rn=row_number() over (partition by patient_ID,diagnosis_code order by visit_date)
,DaysSince = datediff(day,lag(visit_date,1) over (partition by patient_ID,diagnosis_code order by visit_date),visit_date)
from #Codes
) a
where a.DaysSince<=15
You can also use inbuilt FIRST_VALUE and DATEADD functions to achieve this:
SELECT
DISTINCT patient_id,diagnosis_code
FROM
(SELECT
FIRST_VALUE(visit_date) OVER (PARTITION BY patient_id,diagnosis_code ORDER BY visit_id ASC) AS Initial_Visit,
DATEADD(DAY,15,first_value(visit_date) OVER (PARTITION BY patient_id,diagnosis_code ORDER BY visit_id ASC)) Window
,* FROM Codes
)m
WHERE
Initial_Visit <> visit_date
AND visit_date <= Window

Oracle SQL Developer Subscriber - Creating a Cross Table

I have a table UPCALL_HISTORY that has 3 columns: SUBSCRIBER_ID, START_DATE and END_DATE. Let the number of unique subscribers be N.
I want to create a new table with 3 columns:
SUBSCRIBER_ID: All of the unique subscriber ids repeated 36 times in a row.
MONTHLY_CALENDAR_ID: For each SUBSCRIBER_ID, this column will have dates listed from July 2015 until July 2018 (36 months).
ACTIVE: This column will be used as a flag for each subscriber and whether they have a subscription during that month. This subscription data is in a table called UPCALL_HISTORY.
I am fairly new to SQL, don't have a lot of experience. I am good at Python but it seems that SQL doesn't work like Python.
Any query ideas that could help me build this table?
Let my UPCALL_HISTORY table be:
+---------------+------------+------------+
| SUBSCRIBER_ID | START_DATE | END_DATE |
+---------------+------------+------------+
| 119 | 01/07/2015 | 01/08/2015 |
| 120 | 01/08/2015 | 01/09/2015 |
| 121 | 01/09/2015 | 01/10/2015 |
+---------------+------------+------------+
I want a table that looks like:
+---------------+------------+--------+
| SUBSCRIBER_ID | MON_CA | ACTIVE |
+---------------+------------+--------+
| 119 | 01/07/2015 | 1 |
| * | 01/08/2015 | 0 |
| * | 01/09/2015 | 0 |
| (36 times) | 01/10/2015 | 0 |
| * | * | 0 |
| 119 | 01/07/2018 | 0 |
+---------------+------------+--------+
that continues for 120 and 121
EDIT: Added Example
Here's how I understood the question.
Sample table and several rows:
SQL> create table upcall_history
2 (subscriber_id number,
3 start_date date,
4 end_date date);
Table created.
SQL> insert into upcall_history
2 select 1, date '2015-12-25', date '2016-01-13' from dual union
3 select 1, date '2017-07-10', date '2017-07-11' from dual union
4 select 2, date '2018-01-01', date '2018-04-24' from dual;
3 rows created.
Create a new table. For distinct SUBSCRIBER_ID's, it creates 36 "monthly" rows, fixed (as you stated).
SQL> create table new_table as
2 select
3 x.subscriber_id,
4 add_months(date '2015-07-01', column_value - 1) monthly_calendar_id,
5 0 active
6 from (select distinct subscriber_id from upcall_history) x,
7 table(cast(multiset(select level from dual
8 connect by level <= 36
9 ) as sys.odcinumberlist));
Table created.
Update ACTIVE column value to "1" for rows whose MONTHLY_CALENDAR_ID is contained in START_DATE and END_DATE of the UPCALL_HISTORY table.
SQL> merge into new_table n
2 using (select subscriber_id, start_date, end_date from upcall_history) x
3 on ( n.subscriber_id = x.subscriber_id
4 and n.monthly_calendar_id between trunc(x.start_date, 'mm')
5 and trunc(x.end_date, 'mm')
6 )
7 when matched then
8 update set n.active = 1;
7 rows merged.
SQL>
Result (only ACTIVE = 1):
SQL> select * from new_table
2 where active = 1
3 order by subscriber_id, monthly_calendar_id;
SUBSCRIBER_ID MONTHLY_CA ACTIVE
------------- ---------- ----------
1 2015-12-01 1
1 2016-01-01 1
1 2017-07-01 1
2 2018-01-01 1
2 2018-02-01 1
2 2018-03-01 1
2 2018-04-01 1
7 rows selected.
SQL>
If you're on 12c you can use an inline view of all the months with cross apply to get the combinations of those with all IDs:
select uh.subscriber_id, m.month,
case when trunc(uh.start_date, 'MM') <= m.month
and (uh.end_date is null or uh.end_date >= add_months(m.month, 1))
then 1 else 0 end as active
from upcall_history uh
cross apply (
select add_months(trunc(sysdate, 'MM'), - level) as month
from dual
connect by level <= 36
) m
order by uh.subscriber_id, m.month;
I've made it a rolling 36-months window up to the current month, but you may actually want fixed dates as you had in the question.
With sample data from a CTE:
with upcall_history (subscriber_id, start_date, end_date) as (
select 1, date '2015-09-04', '2015-12-15' from dual
union all select 2, date '2017-12-04', '2018-05-15' from dual
)
that generates 72 rows:
SUBSCRIBER_ID MONTH ACTIVE
------------- ---------- ----------
1 2015-07-01 0
1 2015-08-01 0
1 2015-09-01 1
1 2015-10-01 1
1 2015-11-01 1
1 2015-12-01 0
1 2016-01-01 0
...
2 2017-11-01 0
2 2017-12-01 1
2 2018-01-01 1
2 2018-02-01 1
2 2018-03-01 1
2 2018-04-01 1
2 2018-05-01 0
2 2018-06-01 0
You can use that to create a new table, or populate an existing table; though if you do want a rolling window then a view might be more appropriate.
If you aren't on 12c then cross apply isn't available - you'll get "ORA-00905: missing keyword".
You can get the same result with two CTEs (one to get all the months, the other to get all the IDs) cross-joined, and then outer joined to your actual data:
with m (month) as (
select add_months(trunc(sysdate, 'MM'), - level)
from dual
connect by level <= 36
),
i (subscriber_id) as (
select distinct subscriber_id
from upcall_history
)
select i.subscriber_id, m.month,
case when uh.subscriber_id is null then 0 else 1 end as active
from m
cross join i
left join upcall_history uh
on uh.subscriber_id = i.subscriber_id
and trunc(uh.start_date, 'MM') <= m.month
and (uh.end_date is null or uh.end_date >= add_months(m.month, 1))
order by i.subscriber_id, m.month;
You can do this in 11g using Partitioned Outer Joins, like so:
WITH upcall_history AS (SELECT 119 subscriber_id, to_date('01/07/2015', 'dd/mm/yyyy') start_date, to_date('01/08/2015', 'dd/mm/yyyy') end_date FROM dual UNION ALL
SELECT 120 subscriber_id, to_date('01/08/2015', 'dd/mm/yyyy') start_date, to_date('01/09/2015', 'dd/mm/yyyy') end_date FROM dual UNION ALL
SELECT 121 subscriber_id, to_date('01/09/2015', 'dd/mm/yyyy') start_date, to_date('01/10/2015', 'dd/mm/yyyy') end_date FROM dual),
mnths AS (SELECT add_months(TRUNC(SYSDATE, 'mm'), + 1 - LEVEL) mnth
FROM dual
CONNECT BY LEVEL <= 12 * 3 + 1)
SELECT uh.subscriber_id,
m.mnth,
CASE WHEN mnth BETWEEN start_date AND end_date - 1 THEN 1 ELSE 0 END active
FROM mnths m
LEFT OUTER JOIN upcall_history uh PARTITION BY (uh.subscriber_id) ON (1=1)
ORDER BY uh.subscriber_id,
m.mnth;
SUBSCRIBER_ID MNTH ACTIVE
------------- ----------- ----------
119 01/07/2015 1
119 01/08/2015 0
119 01/09/2015 0
119 01/10/2015 0
<snip>
119 01/06/2018 0
119 01/07/2018 0
--
120 01/07/2015 0
120 01/08/2015 1
120 01/09/2015 0
120 01/10/2015 0
<snip>
120 01/06/2018 0
120 01/07/2018 0
--
121 01/07/2015 0
121 01/08/2015 0
121 01/09/2015 1
121 01/10/2015 0
<snip>
121 01/06/2018 0
121 01/07/2018 0
N.B. I have assumed some things about your start/end dates and what constitutes active; hopefully it should be easy enough for you to tweak the case statement to fit the logic that works best for your situation.
I also believe this is can be an example for CROSS JOIN. All I had to do was create a small table of all of the dates and then CROSS JOIN it with the table of subscribers.
Example: https://www.essentialsql.com/cross-join-introduction/

SQL - Find if column dates include at least partially a date range

I need to create a report and I am struggling with the SQL script.
The table I want to query is a company_status_history table which has entries like the following (the ones that I can't figure out)
Table company_status_history
Columns:
| id | company_id | status_id | effective_date |
Data:
| 1 | 10 | 1 | 2016-12-30 00:00:00.000 |
| 2 | 10 | 5 | 2017-02-04 00:00:00.000 |
| 3 | 11 | 5 | 2017-06-05 00:00:00.000 |
| 4 | 11 | 1 | 2018-04-30 00:00:00.000 |
I want to answer to the question "Get all companies that have been at least for some point in status 1 inside the time period 01/01/2017 - 31/12/2017"
Above are the cases that I don't know how to handle since I need to add some logic of type :
"If this row is status 1 and it's date is before the date range check the next row if it has a date inside the date range."
"If this row is status 1 and it's date is after the date range check the row before if it has a date inside the date range."
I think this can be handled as a gaps and islands problem. Consider the following input data: (same as sample data of OP plus two additional rows)
id company_id status_id effective_date
-------------------------------------------
1 10 1 2016-12-15
2 10 1 2016-12-30
3 10 5 2017-02-04
4 10 4 2017-02-08
5 11 5 2017-06-05
6 11 1 2018-04-30
You can use the following query:
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
ORDER BY company_id, effective_date
to get:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 0
2 10 1 2016-12-30 1
3 10 5 2017-02-04 2
4 10 4 2017-02-08 2
5 11 5 2017-06-05 0
6 11 1 2018-04-30 0
Now you can identify status = 1 islands using:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
)
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
Output:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 1
2 10 1 2016-12-30 1
3 10 5 2017-02-04 1
4 10 4 2017-02-08 2
5 11 5 2017-06-05 1
6 11 1 2018-04-30 2
Calculated field grp will help us identify those islands:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
), CTE2 AS
(
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
)
SELECT company_id,
MIN(effective_date) AS start_date,
CASE
WHEN COUNT(*) > 1 THEN DATEADD(DAY, -1, MAX(effective_date))
ELSE MIN(effective_date)
END AS end_date
FROM CTE2
GROUP BY company_id, grp
HAVING COUNT(CASE WHEN status_id = 1 THEN 1 END) > 0
Output:
company_id start_date end_date
-----------------------------------
10 2016-12-15 2017-02-03
11 2018-04-30 2018-04-30
All you want know is those records from above that overlap with the specified interval.
Demo here with somewhat more complicated use case.
Maybe this is what you are looking for? For these kind of questions, you need to join two instance of your table, in this case I am just joining with next record by Id, which probably is not totally correct. To do it better, you can create a new Id using a windowed function like row_number, ordering the table by your requirement criteria
If this row is status 1 and it's date is before the date range check
the next row if it has a date inside the date range
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
else NULL
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
Implementing second criteria:
"If this row is status 1 and it's date is after the date range check
the row before if it has a date inside the date range."
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
when csh1.status_id=1 and csh1.effective_date>#range_en
then
case
when csh3.effective_date between #range_st and #range_en then true
else false
end
else null -- ¿?
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
left join company_status_history csh3
on csh1.id=csh3.id-1
I would suggest the use of a cte and the window functions ROW_NUMBER. With this you can find the desired records. An example:
DECLARE #t TABLE(
id INT
,company_id INT
,status_id INT
,effective_date DATETIME
)
INSERT INTO #t VALUES
(1, 10, 1, '2016-12-30 00:00:00.000')
,(2, 10, 5, '2017-02-04 00:00:00.000')
,(3, 11, 5, '2017-06-05 00:00:00.000')
,(4, 11, 1, '2018-04-30 00:00:00.000')
DECLARE #StartDate DATETIME = '2017-01-01';
DECLARE #EndDate DATETIME = '2017-12-31';
WITH cte AS(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) AS rn
FROM #t
),
cteLeadLag AS(
SELECT c.*, ISNULL(c2.effective_date, c.effective_date) LagEffective, ISNULL(c3.effective_date, c.effective_date)LeadEffective
FROM cte c
LEFT JOIN cte c2 ON c2.company_id = c.company_id AND c2.rn = c.rn-1
LEFT JOIN cte c3 ON c3.company_id = c.company_id AND c3.rn = c.rn+1
)
SELECT 'Included' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Following' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date > #EndDate
AND LagEffective BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Trailing' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date < #EndDate
AND LeadEffective BETWEEN #StartDate AND #EndDate
I first select all records with their leading and lagging Dates and then I perform your checks on the inclusion in the desired timespan.
Try with this, self-explanatory. Responds to this part of your question:
I want to answer to the question "Get all companies that have been at
least for some point in status 1 inside the time period 01/01/2017 -
31/12/2017"
Case that you want to find those id's that have been in any moment in status 1 and have records in the period requested:
SELECT *
FROM company_status_history
WHERE id IN
( SELECT Id
FROM company_status_history
WHERE status_id=1 )
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'
Case that you want to find id's in status 1 and inside the period:
SELECT *
FROM company_status_history
WHERE status_id=1
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'