SQL Server : finding consecutive absence counts for students over custom dates - sql

I have a table which stores attendances of students for each day. I need to get students who are consecutively absent for 3 days. However, the dates when attendance is taken is not in a order, some days like nonattendance, holidays, weekends are excluded. The dates when students attended are the dates where records exist in that table .
The data is like
StudentId Date Attendance
-----------------------------------------
178234 1/1/2017 P
178234 5/1/2107 A
178234 6/1/2107 A
178234 11/1/2107 A
178432 1/1/2107 P
178432 5/1/2107 A
178432 6/1/2107 P
178432 11/1/2107 A
In the above case the result should be
StudentId AbsenceStartDate AbsenceEndDate ConsecutiveAbsences
----------------------------------------------------------------------------
178234 5/1/2017 11/1/2017 3
I have tried to implement this solution Calculating Consecutive Absences in SQL However that only worked for dates in order only. Any suggestions will be great, thanks

Oh, you have both absences and presents in the table. You can use the difference of row_numbers() approach:
select studentid, min(date), max(date)
from (select a.*,
row_number() over (partition by studentid order by date) as seqnum,
row_number() over (partition by studentid, attendance order by date) as seqnum_a
from attendance a
) a
where attendance = 'A'
group by studentid, (seqnum - seqnum_a)
having count(*) >= 3;
The difference of row numbers gets consecutive values that are the same. This is a little tricky to understand, but if you run the subquery, you should see how the difference is constant for consecutive absences or presents. You only care about absences, hence the where in the outer query.

try this:
declare #t table (sid int, d date, att char(1))
insert #t (sid,d, att) values
(178234, '1/1/2017','P'),
(178234, '5/1/2017','A'),
(178234, '6/1/2017','A'),
(178234, '11/1/2017','A'),
(178432, '1/1/2017','P'),
(178432, '5/1/2017','A'),
(178432, '6/1/2017','P'),
(178432, '11/1/2017','A')
Select s.sid, Min(s.d) startDt, Max(e.d) endDt, s.att, e.att, count(*)
from #t s join #t e on e.d <=
(select max(d) from #t m
Where sid = s.sid
and d > s.d
and att = 'A'
and not exists
(Select * from #t
where sid = s.sid
and d between s.d and m.d
and att = 'P'))
Where s.att = 'A'
and s.d = (Select Min(d) from #t
Where sid = s.sid
and d < e.d
and att = 'A')
group by s.sid, s.d, s.att, e.att
this is also tricky to explain:
basically, it joins the table to itself using aliases s (for start) and e (for end), where the s-row is the first row in a set of contiguous absences, and the e. rows are all following absences that are before the next date where the stud is present. This will generate a set of all the 'A' that do not have a P row within them. Then the sql groups by the appropriate values to return the earliest and latest date, and the count of rows, in each group.
The last where clause ensures that the s row is the first row in the group.

Related

Count consecutive ocurrences SQL - PostgreSQL

I am trying to count the number of consecutive weeks an employee went to work. So I have this table that has whether jon or andy went to work on certain weeks (I have all week of the year).
I am trying on Postgresql
What I would like know the number of times each person went consecutively to work x number of weeks.
So the way the below is read is that Andy went twice two consecutive weeks.
I feel like I am close. On python I could use a for loop probably, but on Postgresql I am a bit lost.
Thanks!
We group each amount of consecutive weeks worked per person and then group by the result and the person.
select person
,consecutive_weeks
,count(*)/consecutive_weeks as times
from (
select person
,sum(case when "went to work?" = 1 then 1 end) over(partition by person, grp) as consecutive_weeks
from (
select *
,count(mrk) over(partition by person order by week) as grp
from (
select *
,case when "went to work?" <> lag("went to work?") over(partition by person order by week) then 1 end as mrk
from t
) t
) t
) t
where consecutive_weeks is not null
group by person, consecutive_weeks
order by person
person
consecutive_weeks
times
andy
2
2
john
3
1
john
2
1
Fiddle
You can find groups of weeks where a person was present, assigning a running id to each row of the group, and then apply a count on the results, performing a group by on the id:
with cte as (
select t3.person, t3.k, count(*) c from
(select t.*, (select sum((t1.person = t.person and t1.week <= t.week and t1.at_work = 0)::int) k from tbl t1)
from tbl t) t3
where t3.at_work != 0 group by t3.person, t3.k
)
select c.person, c.c, count(*) c1 from cte c group by c.person, c.c order by c1
See fiddle.

SQL Server, how to get younger users?

I'm trying to get users from a younger country for example I have the following tables.
If there is more than one user of the youngest who have the same age, they should also be shown
Thanks
You can try this query, get MIN birthday on subquery then self join on users table.
select u.idcountry,t.name,u.username, (DATEPART(year, getdate()) - t.years) 'age'
from
(
SELECT u.idcountry,c.name,DATEPART(year, u.birthday) as 'years',count(*) as 'cnt'
FROM users u inner join country c on u.idcountry = c.idcountry
group by u.idcountry,c.name,DATEPART(year, u.birthday)
) t inner join users u on t.idcountry = u.idcountry and t.years = DATEPART(year, u.birthday)
where t.cnt > 1
sqlfiddle:https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=9baab959f79b1fa8c28ed87a8640e85d
Use the rank() window function:
select ...
from ...
where rank() over (partition by idcountry order by birthday) = 1
Rows with the same birthday in a country are ranked the same, so this returns all youngest people with if there’s more than one.
This is a little tricky. I would use window functions -- count the people of a particular age and choose the ones where there are duplicates for the youngest.
You don't specify how to define age, so I'll just use the earliest calendar year:
select u.*
from (select u.*,
count(*) over (partition by idcountry, year(birthday)) as cnt_cb,
rank() over (partition by idcountry order by year(birthday)) as rnk
from users u
) u
where cnt_cb > 1 and rnk = 1;
I'll let you handle the joins to bring in the country name.
Your sample data and desired results show the oldest users within each country when more than one of the oldest have the same age. The query below will do that, assuming age is calculated using full birth date.
WITH
users AS (
SELECT
username
, birthday
, idcountry
, (CAST(CONVERT(char(8),GETDATE(),112) AS int) - CAST(CONVERT(char(8),birthday,112) AS int)) / 10000 AS age
, RANK() OVER(PARTITION BY idcountry ORDER BY (CAST(CONVERT(char(8),GETDATE(),112) AS int) - CAST(CONVERT(char(8),birthday,112) AS int)) / 10000 DESC) AS age_rank
FROM dbo.Users
)
, oldest_users AS (
SELECT
username
, birthday
, idcountry
, age
, COUNT(*) OVER(PARTITION BY idcountry, age_rank ORDER BY age_rank) AS age_count
FROM users
WHERE age_rank = 1
)
SELECT
c.idcountry
, c.name
, oldest_users.age
, oldest_users.username
FROM oldest_users
JOIN dbo.Country AS c ON c.idcountry = oldest_users.idcountry
WHERE
oldest_users.age_count > 1;

Insert into another table after fetching latest date and and performing an inner join

I have a table called "Member_Details" which has multiple records for each member_ID. For Example,
I have another table called "BMI_Data" that looks like the following.
The goal is to fetch the names of those members whose "BMI" in "Member_Details" is less than the "target_BMI" in "BMI_Data" table and insert it into a new table called "results" with "Member_ID, First_Name and BMI" as its schema.
Also, one consideration is to fetch the latest data available in the "Member_Details" for each member (based on date) and then do the comparison
The result for the above scenario would be something like this.
I tried using the following query
INSERT INTO results_table (Member_ID, First_Name, BMI)
select c.Member_ID, First_Name, BMI
from
(SELECT *, ROW_NUMBER() OVER (PARTITION BY Member_ID ORDER BY Date desc)
AS ROWNUM FROM Member_Details) x
JOIN
BMI_Data c ON x.Member_ID = c.Member_ID
where
x.BMI < c.Target_BMI
The above query doesn't fetch the latest date and simply loads all records in which member BMI is less than target_BMI.
Please help !
An alternate query might be
INSERT INTO results_table (Member_ID, First_Name, BMI)
select md2.member_ID, md2.First_Name, md2.BMI
from BMI_Data bd
inner join (select distinct md.member_ID ,md.First_Name ,(select top 1 BMI from Member_Details where member_ID = md.member_ID order by Date desc) BMI from Member_Details md) md2 on md2.member_ID = bd.member_ID
where md2.BMI < bd.Target_BMI
First you haven't specify the condition after row_numbers defined
INSERT INTO results_table (Member_ID, First_Name, BMI)
select c.Member_ID, First_Name, BMI
from (SELECT *,
ROW_NUMBER() OVER (PARTITION BY Member_ID ORDER BY Date desc) AS ROWNUM
FROM Member_Details
) x JOIN BMI_Data c
ON x.Member_ID = c.Member_ID
where x.ROWNUM = 1 and x.BMI < c.Target_BMI;
Wanted to note - there is no such date as '31-April-2018'! You might meant '1-May-2018'
In any case - it is important to make sure that when you are ordering by Date you first cast it to data type of DATE otherwise ordering is not correct. Below makes this ordering proper and in addition proposes alternative way by using ARRAY_AGG() with ORDER BY and LIMIT 1
#standardSQL
INSERT INTO results_table (Member_ID, First_Name, BMI)
SELECT * EXCEPT(Target_BMI)
FROM (
SELECT Member_ID, First_Name,
ARRAY_AGG(BMI ORDER BY PARSE_DATE('%d-%B-%Y', Date) DESC LIMIT 1)[OFFSET(0)] BMI
FROM `project.dataset.member_details`
GROUP BY Member_ID, First_Name
) d
JOIN `project.dataset.bmi_data` t
USING(Member_ID)
WHERE BMI < Target_BMI

SQL I have to find the entire row of people that did something the same day. count function?

I have a table called Donates.
I have to find all d_names who donated more than once on a single day.
I have no idea how to combine those 2 queries.
Any help is appreciated.
This is my table.
3 fields.
donors receivers giftdate
a donor could only give a receiver a gift one time.
Donors can donate more than once and receivers can receive more than once.
I just have to find who donated a gift more than once on a day. But i need to know when and to who.
You are correct that you would use COUNT, and you would use a HAVING clause to filter:
select d_name
from Donates
group by d_name
having count(1) > 1
You will of course need to add whatever other clauses to meet your requirements, such as limiting to or grouping by day. The simplest being to limit the results to one single day (you can use both WHERE and HAVING in the same query):
select d_name
from Donates
where g_date = #Date
group by d_name
having count(1) > 1
Responding to your comment, you can join on this query as a derived table:
select *
from Donates
inner join (
select d_name
from Donates
where g_date = #Date
group by d_name
having count(1) > 1
) x on Donates.d_name = x.d_name
After all the comments in multiple places, I believe you're finally looking for something like:
select d_name, r_name, g_date
from Donates
inner join (
select d_name, g_date
from Donates
group by d_name, g_date
having count(1) > 1
) x on Donates.d_name = x.d_name and Donates.g_date = x.g_date
OP now says he is using Oracle, can't use GROUP BY, and wants all fields in the table.
He wants donors who donated more than once in any given day (regardless of the receivers).
select distinct d1.*
from Donates d1
inner join Donates d2
on d1.donors = d2.donors
and trunc(d1.giftdate) = trunc(d2.giftdate)
and d1.rowid < d2.rowid
;
select *
from Donates
where d_name in (
select d_name
from Donates
where cast(d_date as Date) in (
select cast(d_date as Date)
from Donates
group by cast(d_date as Date)
having count(cast(d_date as Date)) > 1
)
group by d_name
)
I would suggest simply using analytic functions:
select d.*
from (select d.*, count(*) over (partition by trunc(d.giftdate), d.name) as cnt
from donates d
) d
where cnt > 1;

Sql Server Find Next Most Recent Changed Record

In my employee history table I'm trying to find what the salary was and then what it was changed to. Each Salary change inserts a new record because the new salary is considered a new "job" so it has a start and end date attached. I can select all these dates fine but I keep getting duplicates because I can't seem to compare the current record only against its most recent prior record for that employee. (if that makes sense)
I would like the results to be along the lines of:
Employe Name, OldSalary, NewSalary, ChangeDate(EndDate)
Joe 40,000 42,000 01/10/2011
Example data looks like
EmployeeHistId EmpId Name Salary StartDate EndDate
1 45 Joe 40,000.00 01/05/2011 01/10/2011
2 45 Joe 42,000.00 01/11/2011 NULL
3 46 Bob 20,000.00 01/12/2011 NULL
The Swiss army ROW_NUMBER() to the rescue:
with cte as (
select EmployeeHistId
, EmpId
, Name
, Salary
, StartDate
, EndDate
, row_number () over (
partition by EmpId order by StartDate desc) as StartDateRank
from EmployeeHist)
select n.EmpId
, n.Name
, o.Salary as OldDalary
, n.Salary as NewSalary
, o.EndData as ChangeDate
from cte n
join cte o on o.EmpId = n.EmpId
and n.StartDateRank = 1
and o.StartDateRank = 2;
Use outer join to get employees that never got a raise too.
These kind of queries are always tricky because of data purity issues, if StartDate and EndDate overlap for instance.
I assume the StartDate and EndDate will be same for the new job and previous job.
If thats the case try this.
SELECT a.Name AS EmployeeName, b.Salary AS NewSalary a.Salary AS NewSalary, a.StartDate AS ChangeDate
FROM EMPLOYEE A, EMPLOYEE B
WHERE a.EmpID = b.EmpID
AND a.EndDate IS NULL
AND a.StartDate = b.EndDate
You can use the correlated join operator APPLY which can solve these types of challenges easily
select a.name, curr.salary, prev.salary, prev.enddate
from employee e
cross apply ( -- to get the current
select top(1) *
from emphist h
where e.empid = h.empid -- related to the employee
order by startdate desc) curr
outer apply ( -- to get the prior, if any
select top(1) *
from emphist h
where e.empid = h.empid -- related to the employee
and h.EmployeeHistId <> curr.EmployeeHistId -- prevent curr=prev
order by enddate desc) prev -- last ended