SQL records only for 3 consecutive months - sql

I have table for emp. attendance sheet:
emp_No Absent_Date
-------------------
111 01/03/2012
111 05/05/2012
222 13/02/2012
222 01/03/2012
222 02/03/2012
222 29/04/2012
222 09/09/2012
333 15/05/2012
333 18/09/2012
333 19/09/2012
I need to return the rows like below:
emp_No Absent_Date
-------------------
222 13/02/2012
222 01/03/2012
222 02/03/2012
222 29/04/2012
because only emp no 222 having absent in 3 consecutive months.

What you are trying to do is to group the absences by consecutive months. Let me assume that you are using a reasonable database that supports the dense_rank() function and basic window functions.
The idea is to find months in sequence that have absences. Then, count the number of months in each sequence for each employee and keep the ones that have more than three months.
The query does this by converting the month to a month number -- 12 times the year plus the month. It then uses a simple observation. The month number minus a sequence of numbers is a constant, for consecutive months. Usually, I use row_number() for the sequence. Because you have duplicate absences in a month, I'm using dense_rank().
select emp_no, absent_date
from (select a.*,
max(monthnum) over (partition by emp_no, groupnum) as lastmonth,
min(monthnum) over (partition by emp_no, groupnum) as firstmonth
from (select a.*,
monthnum - dense_rank() over (partition by emp_no order by monthnum) as groupnum
from (select a.*,
year(a.absent_date)*12+month(a.absent_date) as monthnum
from Attendance a
) a
) a
) a
where lastmonth - firstmonth >= 2
Finally, because you want the absent dates -- as opposed to just the employee numbers -- I find the first and last month using window functions and use their difference as a filter.

I guess the easiest is to do a self join of the table three times, each time adding 1 month to the date:
SELECT DISTINCT S1.emp_No
FROM attendance_sheet S1
JOIN attendance_sheet S2
ON S1.emp_No = S2.emp_No
AND Month(S1.Absent_Date + 1 MONTH) = Month(S2.Absent_Date)
AND Year(S1.Absent_Date + 1 MONTH) = Year(S2.Absent_Date)
JOIN attendance_sheet S3
ON S2.emp_No = S3.emp_No
AND Month(S2.Absent_Date + 1 MONTH) = Month(S3.Absent_Date)
AND Year(S2.Absent_Date + 1 MONTH) = Year(S3.Absent_Date)
This will give you all the unique emp_No's. Now to get the result you want, you'll have to do another join (respectively I'll use IN for easier reading):
SELECT *
FROM attendance_sheet
WHERE emp_No IN (
SELECT S1.emp_No
FROM attendance_sheet S1
JOIN attendance_sheet S2
ON S1.emp_No = S2.emp_No
AND Month(S1.Absent_Date + 1 MONTH) = Month(S2.Absent_Date)
AND Year(S1.Absent_Date + 1 MONTH) = Year(S2.Absent_Date)
JOIN attendance_sheet S3
ON S2.emp_No = S3.emp_No
AND Month(S2.Absent_Date + 1 MONTH) = Month(S3.Absent_Date)
AND Year(S2.Absent_Date + 1 MONTH) = Year(S3.Absent_Date)
)
See the SQL Fiddle to try (I had to change the month adding syntax from standard SQL to MySQL).

Try for this code:
SELECT DISTINCT * FROM
(
SELECT E1.Attendance _No,
E1.Absent_Date
FROM Attendance E1
JOIN Attendance E2
ON E2.Attendance _No = E1.Attendance _No
AND MONTH(E2.Absent_Date) = MONTH(E1.Absent_Date) + 1
JOIN Attendance E3
ON E3.Attendance _No = E2.Attendance _No
AND MONTH(E3.Absent_Date) = MONTH(E2.Absent_Date) + 1
UNION ALL
SELECT E2.Attendance _No,
E2.Absent_Date
FROM Attendance E1
JOIN Attendance E2
ON E2.Attendance _No = E1.Attendance _No
AND MONTH(E2.Absent_Date) = MONTH(E1.Absent_Date) + 1
JOIN Attendance E3
ON E3.Attendance _No = E2.Attendance _No
AND MONTH(E3.Absent_Date) = MONTH(E2.Absent_Date) + 1
UNION ALL
SELECT E3.Attendance _No,
E3.Absent_Date
FROM Attendance E1
JOIN Attendance E2
ON E2.Attendance _No = E1.Attendance _No
AND MONTH(E2.Absent_Date) = MONTH(E1.Absent_Date) + 1
JOIN Attendance E3
ON E3.Attendance _No = E2.Attendance _No
AND MONTH(E3.Absent_Date) = MONTH(E2.Absent_Date) + 1
) A

Related

Create months between two dates Snowflake SQL

I just want to generate the months between data range using SQL Query.
example
You can use a table generator:
select '2022-07-04'::date +
row_number() over(partition by 1 order by null) - 1 GENERATED_DATE
from table(generator(rowcount => 365))
;
Just change the start date and the number of days into the series. You can use the datediff function to calculate the number of days between the start end end dates.
Edit: I just realized the generator table function requires a constant for the number of rows. That's easily solvable. Just set a higher number of rows than you'll need and specify the end of the series in a qualify clause:
set startdate = (select '2022-04-15'::date);
set enddate = (select '2022-07-04'::date);
select $startdate::date +
row_number() over(partition by 1 order by null) - 1 GENERATED_DATE
from table(generator(rowcount => 100000))
qualify GENERATED_DATE <= $enddate
;
You can use a table generator in the CTE, and then select from the CTE and cartesian join to your table with data and use a case statement to see if the date in the generator is between your start and to dates.
Then select from it:
select user_id, x_date
from (
with dates as (
select '2019-01-01'::date + row_number() over(order by 0) x_date
from table(generator(rowcount => 1500))
)
select d.x_date, t.*,
case
when d.x_date between t.from_date and t.to_date then 'Y' else 'N' end target_date
from dates d, my_table t --deliberate cartesian join
)
where target_date = 'Y'
order by 1,2
Output:
USER_ID X_DATE
1 2/20/2019
1 2/21/2019
1 2/22/2019
1 2/23/2019
2 2/22/2019
2 2/23/2019
2 2/24/2019
2 2/25/2019
2 2/26/2019
2 2/27/2019
2 2/28/2019
3 3/1/2019
3 3/2/2019
3 3/3/2019
3 3/4/2019
3 3/5/2019
=======EDIT========
Based on your comments below, you are actually looking for something different than your original screenshots. Ok, so here we are still using the table generator, and then we're truncating the month to the first day of the month where the x-date is YES.
select distinct t.user_id, t.from_date, t.to_date, date_trunc('MONTH', z.x_date) as trunc_month
from (
with dates as (
select '2019-01-01'::date + row_number() over(order by 0) x_date
from table(generator(rowcount => 1500))
)
select d.x_date, t.*,
case
when d.x_date between t.from_date and t.to_date then 'Y' else 'N' end target_date
from dates d, my_table t
)z
join my_table t
on z.user_id = t.user_id
where z.target_date = 'Y'
order by 1,2
Output (modified User ID 3 to span 2 months):
USER_ID FROM_DATE TO_DATE TRUNC_MONTH
1 2/20/2019 2/23/2019 2/1/2019
2 2/22/2019 2/28/2019 2/1/2019
3 2/25/2019 3/5/2019 2/1/2019
3 2/25/2019 3/5/2019 3/1/2019

Get most recent salary for the Employee

I have a Employee table which has Empl_id,Hourly rate and effective date.
I want to get the Hourly rate of the employee for a give month if it exists if not give me the most recent Hourly rate for that employee.
select EMPL_ID, HRLY_AMT
from Employee a
where exists (select 1
from Employee b
where b.EMPL_ID = a.EMPL_ID and
b.EFFECT_DT between '2019-10-01' and '2019-10-31'
)
group by EMPL_ID
Data Sample
Empl ID HOUR_AMT EFFECT_DT
1 10 2017-07-01
1 20 2018-10-01
1 40 2019-10-01
2 40 2017-06-01
2 45 2018-09-01
2 60 2019-09-01
Now If I pass Month = 07 & Year = 2017
It should show
Empl ID HOUR_AMT EFFECT_DT
1 10 2017-07-01
2 40 2017-12-01
Now If I pass Month = 11 & Year = 2019
Empl ID HOUR_AMT EFFECT_DT
1 40 2019-10-01
2 60 2019-09-01
You could use exists, but I prefer INNER JOINs. The inner join will remove all results except the one that falls immediately after your #GivenMonth.
SELECT EMPL_ID, HRLY_AMT
FROM Employee AS a
INNER JOIN (
SELECT EMPL_ID, MIN(EFFECT_DT) AS EFFECT_DT
FROM Employee
WHERE EFFECT_DT >= #GivenMonth
GROUP BY EMPL_ID
) AS r ON r.EMPL_ID = a.EMPL_ID AND r.EFFECT_DT = a.EFFECT_DT
Try Following:
For Recent Hourly rate of the employee
select EMPL_ID, HRLY_AMT
from (select ROW_NUMBER () OVER (PARTITION BY EMPL_ID ORDER BY EFFECT_DT AS DESC) RecentRN
from Employee b
where b.EMPL_ID = a.EMPL_ID and
b.EFFECT_DT between '2019-10-01' and '2019-10-31'
)
where RecentRN = 1
For Highest Hourly rate of the employee
select EMPL_ID, HRLY_AMT
from (select ROW_NUMBER () OVER (PARTITION BY EMPL_ID ORDER BY HRLY_AMT AS DESC) HighRate
from Employee b
where b.EMPL_ID = a.EMPL_ID and
b.EFFECT_DT between '2019-10-01' and '2019-10-31'
)
where HighRate = 1
if you have entry of two values "month and Year" your query should be
SELECT Empl_ID ,
HOUR_AMT ,
EFFECT_DT
FROM (
SELECT Empl_ID ,
HOUR_AMT ,
EFFECT_DT ,
A.CurDiff,
ROW_NUMBER() OVER ( PARTITION BY A.Empl_ID ORDER BY ABS(A.CurDiff) ) Row_Id
FROM ( SELECT Empl_ID ,
HOUR_AMT ,
EFFECT_DT ,
DATEDIFF(DAY, CONVERT(DATE, #Year + '/' + #Month + '/01'), EFFECT_DT) CurDiff
FROM #tbl
) A
) Final
WHERE Final.Row_Id = 1
To view the demo for the output of this query Click Here

Finding rows which don't have continuous date range in db2

I have data like the following and there are millions of rows like this
MBR MBR_SPAN EFF_DT END_DT
1 B 1/1/2011 12/31/2011
1 C 1/1/2012 12/31/2012
1 A 2/1/2013 12/31/2013
2 D 1/1/2010 12/31/2010
2 X 1/1/2011 12/31/ 2011
I need to find the row for each member where it is not continuous with the previous date range. In this case it is MBR 1 and MBR_SPAN A
I don't have a column which is continuous to sort and determine which should have continous date range. It has to be determined by comparing previous row (May be by sorting eff_dt)
Also it has to be done without creating any temp table as i dont have access to create tables in db2.
Can anyone help?
Here is one method:
select *
from (select t.*,
lag(end_dt) over (partition by mbr order by eff_dt) as prev_end_dt
from t
) t
where end_dt <> prev_end_dt + 1 day and prev_end_dte is not null;
Variation: suppose your table is called 'mydate' and you want a single-row result set from the above sample data:
select *
from (select t.*,
lag(end_dt) over (partition by mbr order by eff_dt) as prev_end_dt
from mydate as t
) x
where x.eff_dt <> x.prev_end_dt + 1 day and x.prev_end_dt is not null
Other method :
with tmp as
(
select f1.*, rownumber() over (partition by f1.mbr order by f1.eff_dt, f1.END_DT) as rang
from yourtablename f1
)
select f1.* from tmp f1
inner join tmp f2 on f1.mbr=f2.mbr and f1.rang=f2.rang-1 and f1.eff_dt + 1 day <> f2.eff_dt

Getting Average based on 2 conditions (columns and rows)

My data is looking like this:
PRODUCT DEPT DATE PERCENTAGE
1 A JAN 2
1 B FEB 4
1 A MAR 1
1 B JAN 5
1 A FEB 3
1 B MAR 7
1 A JAN 3
1 B FEB 4
1 A MAR 2
1 B JAN 8
1 A FEB 9
1 B MAR 6
... ... ... ...
With thousands of different products and dozens of departments.
The calculation I have to go through is:
1 - Sum the percentages as follow: by product, dept and date (so Product 1 / DEPT A / JAN => SUM(PERCENTAGE). For each PRODUCT, DEPT and DATE.
2 - When I have my sums, get the average of the 3 months for each product and dept (product 1 dept A: JAN / FEB / MAR, and so on)
3 - Get the max average (for each product, which dept has the highest average).
I have something which works but it's so long I am sure I can learn and make something better:
Select
Verylong_q.TFC,
Round(MAX(verylong_q.average),2) AS HIGHEST_AVERAGE
FROM
(
SELECT
Long_Q.TFC,
Long_Q.DEPT,
Long_Q.Percentage1,
Long_Q.Percentage2,
Long_Q.Percentage3,
((Percentage1 + Percentage2 + Percentage3)/3) AS Average
FROM
(
SELECT
t_Month1.TFC,
t_Month1.DEPT,
t_Month1.Percentage1,
t_Month2.Percentage2,
t_Month3.Percentage3
From
(
Select
pos.TFC,
mv.Dept AS Sector,
sum(pos.percentage) AS Percentage3
FROM
TBO_POS pos,
TBL_MV mv
Where
pos.IV_ID = mv.IV_ID
and Date = […]
and TFC in […]
group by pos.TFC, mv.Dept, pos.Date
order by 1 DESC ) t_Month1
LEFT JOIN
(
Select
pos.TFC,
mv.Dept AS Sector,
sum(pos.percentage) AS Percentage2
FROM
TBO_POS pos,
TBL_MV mv
Where
pos.IV_ID = mv.IV_ID
and Date = […]
and TFC in […]
group by pos.TFC, mv.Dept, pos.Date
order by 1 DESC ) t_Month2
On t_month1.DEPT = t_month2.DEPT and t_month1.TFC = t_month2.TFC
LEFT JOIN
(
Select
pos.TFC,
mv.Dept AS Sector,
sum(pos.percentage) AS Percentage3
FROM
TBO_POS pos,
TBL_MV mv
Where
pos.IV_ID = mv.IV_ID
and Date = […]
and TFC in […]
group by pos.TFC, mv.Dept, pos.Date
order by 1 DESC ) t_Month3
on t_month1.DEPT = t_month3.DEPT and t_month1.TFC = t_month3.TFC
) Long_Q
) VeryLong_Q
Group by verylong_q.TFC
How could I do this in a better way? Thanks!
Isn't that simply:
Sum the percentages by product, dept and date in the innermost subquery
Get the average of the months for each product and dept in the next subquery
Get the max average for each product in the main query.
Query:
select product, max(avg_sum_percentage)
from
(
select product, dept, avg(sum_percentage) as avg_sum_percentage
from
(
select product, dept, date, sum(percentage) as sum_percentage
from mytable
group by product, dept, date
) per_product_dept_date
group by product, dept
) per_product_dept
group by product;
From what you describer lag() seems like the appropriate method, along with aggregation and selection of the best:
select *
from (select product, dept, (sump_1 + sump_2 + sump_3) /3 as avg_max,
row_number() over (partition by product order by (sump_1 + sump_2 + sump_3) /3 desc) as seqnum
from (select product, dept, date, sum(percentage) as sump,
lag(sum(percentage)) over (partition by product, dept order by date) as sump_1,
lag(sum(percentage, 2)) over (partition by product, dept order by date) as sump_2
from TBO_POS pos join
TBL_MV mv
on pos.IV_ID = mv.IV_ID
where Date = […] and TFC in […]
group by product, dept, date
) t
) t
where seqnum = 1;
This solution follows the description of the problem. It produces one row for each month and product. This version does not take into account missing values and other issues. I think this is the logic you want, but without expected results the question might be ambiguous.

Efficient join with a "correlated" subquery

Given three tables Dates(date aDate, doUse boolean), Days(rangeId int, day int, qty int) and Range(rangeId int, startDate date) in Oracle
I want to join these so that Range is joined with Dates from aDate = startDate where doUse = 1 whith each day in Days.
Given a single range it might be done something like this
SELECT rangeId, aDate, CASE WHEN doUse = 1 THEN qty ELSE 0 END AS qty
FROM (
SELECT aDate, doUse, SUM(doUse) OVER (ORDER BY aDate) day
FROM Dates
WHERE aDate >= :startDAte
) INNER JOIN (
SELECT rangeId, day,qty
FROM Days
WHERE rangeId = :rangeId
) USING (day)
ORDER BY day ASC
What I want to do is make query for all ranges in Range, not just one.
The problem is that the join value "day" is dependent on the range startDate to be calculated, wich gives me some trouble in formulating a query.
Keep in mind that the Dates table is pretty huge so I would like to avoid calculating the day value from the first date in the table, while each Range Days shouldn't be more than a 100 days or so.
Edit: Sample data
Dates Days
aDate doUse rangeId day qty
2008-01-01 1 1 1 1
2008-01-02 1 1 2 10
2008-01-03 0 1 3 8
2008-01-04 1 2 1 2
2008-01-05 1 2 2 5
Ranges
rangeId startDate
1 2008-01-02
2 2008-01-03
Result
rangeId aDate qty
1 2008-01-02 1
1 2008-01-03 0
1 2008-01-04 10
1 2008-01-05 8
2 2008-01-03 0
2 2008-01-04 2
2 2008-01-05 5
Try this:
SELECT rt.rangeId, aDate, CASE WHEN doUse = 1 THEN qty ELSE 0 END AS qty
FROM (
SELECT *
FROM (
SELECT r.*, t.*, SUM(doUse) OVER (PARTITION BY rangeId ORDER BY aDate) AS span
FROM (
SELECT r.rangeId, startDate, MAX(day) AS dm
FROM Range r, Days d
WHERE d.rangeid = r.rangeid
GROUP BY
r.rangeId, startDate
) r, Dates t
WHERE t.adate >= startDate
ORDER BY
rangeId, t.adate
)
WHERE
span <= dm
) rt, Days d
WHERE d.rangeId = rt.rangeID
AND d.day = GREATEST(rt.span, 1)
P. S. It seems to me that the only point to keep all these Dates in the database is to get a continuous calendar with holidays marked.
You may generate a calendar of arbitrary length in Oracle using following construction:
SELECT :startDate + ROWNUM
FROM dual
CONNECT BY
1 = 1
WHERE rownum < :length
and keep only holidays in Dates. A simple join will show you which Dates are holidays and which are not.
Ok, so maybe I've found a way. Someting like this:
SELECT irangeId, aDate + sum(case when doUse = 1 then 0 else 1) over (partionBy rangeId order by aDate) as aDate, qty
FROM Days INNER JOIN (
select irangeId, startDate + day - 1 as aDate, qty
from Range inner join Days using (irangeid)
) USING (aDate)
Now I just need a way to fill in the missing dates...
Edit: Nah, this way means that I'll miss the doUse vaue of the last dates...