SQL - Find number of new starters and leavers between snapshot dates - sql

I have an SQL table staff that takes a snapshot on specific dates and adds a row of every staffID and the corresponding DateID.
I need to find out how many staff have joined since the next DateID, and how many have left.
So in the example staff table below at DateID B, StaffID's 002 and 003 from DateID A aren't there so have 'left', and DateID B has staffID's 004,005,006 that were not there in DateID so are 'new'.
StaffID DateID
007 C
005 C
006 B
005 B
004 B
001 B
003 A
002 A
001 A
I've surmised how these results would appear in the below.
DateID New Leavers
A 0 2
B 3 2
C 1 3
My current and only way of solving this is to go through each DateID with the DateID before it and left join the older date counting the rows where the old date it null for the number of new staff and swapping the tables round for the leavers.
SELECT t1.DateID, count(*) AS Total
(SELECT *
FROM staff
WHERE DateID = 'B') t1
LEFT JOIN
(SELECT *
FROM staff
WHERE DateID = 'A') t2
ON t1.StaffID = t2.StaffID
WHERE t2.StaffID is null
GROUP BY t1.DateID
This method is horribly inefficient with a larger table and hoping anyone has ideas for a way to do in one script. Alternatively, a script just for new staff and a script for just leavers would be as good.
As requested by #Larnu, I've added a snapshot table that holds all the DateID's. the staff table is filtered to just show DateID's that are weekly.
DateID Weekly Monthly Yearly
A Y Y N
B Y N N
C Y N N
D N N N
E Y Y N
F N N Y

LEAD and LAG window functions would help here.
Since the DateIDs are not consecutive, you need to calculate LEAD/LAG on that also, and join it
SELECT
s.DateID,
[New] = COUNT(CASE WHEN s.PrevID IS NULL OR s.PrevID <> d.PrevDateID THEN 1 END),
Leavers = COUNT(CASE WHEN s.NextID IS NULL OR s.NextID <> d.NextDateID THEN 1 END)
FROM (
SELECT *,
PrevDateID = LAG(DateID) OVER (ORDER BY DateID),
NextDateID = LEAD(DateID) OVER (ORDER BY DateID)
FROM Dates d
) d
JOIN (
SELECT *,
PrevID = LAG(s.DateID) OVER (PARTITION BY StaffID ORDER BY DateID),
NextID = LEAD(s.DateID) OVER (PARTITION BY StaffID ORDER BY DateID)
FROM staff s
) s ON s.DateID = d.DateID
GROUP BY
s.DateID;

Related

How do I join two tables in SQL limiting the second table data depending on the first one?

I have two tables looking something like this:
IMP DATE CAT
A 03/03/2016 1
B 04/04/2016 1
C 09/09/2016 2
D 01/01/2017 1
E 02/02/2017 1
F 03/03/2017 2
G 04/04/2017 2
===================
EXP DATE CAT
H 01/01/2016 1
I 05/05/2016 1
J 07/07/2016 2
K 11/11/2016 2
L 01/01/2017 1
M 03/03/2017 1
N 04/04/2017 2
O 05/05/2017 2
I want to join the first table to the second one but limit the lines joined from the second table by the latest date on the first table (per category).
The result I'm looking for would be every row in both tables except Item "M" (because Cat 1 in Table 1 has a latest date of February) and Item "O" (because Cat 2 in Table 1 has a latest date of April).
I've tried conditionalds within a where clause in the 2nd table but haven't got far.
Is there a simple way to do this? Any help is appreciated. I'm using SQL Server 2008 by the way.
Your description of the problem is specifically about using join. That suggests a query like this:
select . . .
from (select t1.*, max(t1.date) over (partition by t1.cat) as maxdate
from table1 t1
) t1 join
table2 t2
on t1.cat = t2.cat and t2.date <= t1.maxdate;
Desire output and its format is still not clear.
are you looking for this ?
;With CTE as
(
select *, row_number()over(partition by cat order by DATEs desc) rn
from #table1
)
--select * from cte
--where rn=1
select * from cte t1
left join #table2 t2
on t1.CAT=t2.CAT and t2.DATEs<=t1.DATEs
where rn=1

Selecting objects that are associated with similar datasets

I'm trying to select all company rows from a [Company] table that share with at least one other company, the same number of employees (from an [Employee] table that has a CompanyId column), where each group of respective employees share the same set of LocationIds (a column in the [Employee] table) and in the same proportion.
So, for instance, two companies with three employees each that have the locationIds 1,2, and 2, would be selected by this query.
[Employee]
EmployeeId | CompanyId | LocationId |
========================================
1 | 1 | 1
2 | 1 | 2
3 | 1 | 2
4 | 2 | 1
5 | 2 | 2
6 | 2 | 2
7 | 3 | 3
[Company]
CompanyId |
============
1 |
2 |
3 |
Returns the CompanyIds:
======================
1
2
CompanyIds 1 and 2 are selected because they share in common with at least one other company: 1. the number of employees (3 employees); and 2. the number/proportion of LocationIds associated with those employees (1 employee has LocationId 1 and 2 employees have LocationId 2).
So far I think I want to use a HAVING COUNT(?) > 1 statement, but I'm having trouble working out the details. Does anyone have any suggestions?
This is ugly, but the only way I can think of to do it:
;with CTE as (
select c.Id,
(
select e.Location, count(e.Id) [EmployeeCount]
from Employee e
where e.IdCompany=c.Id
group by e.Location
order by e.Location
for xml auto
) LocationEmployeeData
from Company c
)
select c.Id
from Company c
join (
select x.LocationEmployeeData, count(x.Id) [CompanyCount]
from CTE x
group by x.LocationEmployeeData
having count(x.Id) >= 2
) y on y.LocationEmployeeData = (select LocationEmployeeData from CTE where Id = c.Id)
See fiddle: http://www.sqlfiddle.com/#!6/6bc16/5
It works by encoding the Employee count per Location data (multiple rows) into an xml string for each Company.
The CTE code on its own:
select c.Id,
(
select e.Location, count(e.Id) [EmployeeCount]
from Employee e
where e.IdCompany=c.Id
group by e.Location
order by e.Location
for xml auto
) LocationEmployeeData
from Company c
Produces data like:
Id LocationEmployeeData
1 <e Location="1" EmployeeCount="2"/><e Location="2" EmployeeCount="1"/>
2 <e Location="1" EmployeeCount="2"/><e Location="2" EmployeeCount="1"/>
3 <e Location="3" EmployeeCount="1"/>
Then it compares companies based on this string (rather than trying to ascertain whether multiple rows match, etc).
An alternative solution could look like this. However it also requires performance testing in advance (I don't feel quite confident with <> type join).
with List as
(
select
IdCompany,
Location,
row_number() over (partition by IdCompany order by Location) as RowId,
count(1) over (partition by IdCompany) as LocCount
from
Employee
)
select
A.IdCompany
from List as A
inner join List as B on A.IdCompany <> B.IdCompany
and A.RowID = B.RowID
and A.LocCount = B.LocCount
group by
A.IdCompany, A.LocCount
having
sum(case when A.Location = B.Location then 1 else 0 end) = A.LocCount
Related fiddle: http://sqlfiddle.com/#!6/d9f2e/1

SQL Procedure (Sybase Advantage Database Server)

I have table contain below data:
EMPCODE PAYCODE AMOUNT
------------------------
001 A 100
001 B 200
002 A 120
002 C 80
003 B 50
003 D 20
All PAYCODE in table at the moment are A, B, C, D.
However, other EMPCODE with other new PAYCODE such as E or F, might be added in later on.
EMPCODE 001 has PAYCODE A and B (he doesn't have PAYCODE C and D).
EMPCODE 002 has PAYCODE A and C (he doesn't have PAYCODE B and D).
EMPCODE 003 has PAYCODE B and D (he doesn't have PAYCODE A and C).
I want to create a simple stored procedure / SQL which can add the dummy records for each EMPCODE for PAYCODE which they don't own.
My expected result as below:
EMPCODE PAYCODE AMOUNT
------------------------
001 A 100
001 B 200
001 C 0
001 D 0
002 A 120
002 B 0
002 C 80
002 D 0
003 A 0
003 B 50
003 C 0
003 D 20
I can achieve that through coding but I need to do it via a stored procedure.
Is there any SQL stored procedure to achieve this?
Appreciate for the answer.
Use Join to get the result. SQLFiddle
SELECT C.EMPCODE, C.PAYCODE, ISNULL(D.AMOUNT, 0) AS AMOUNT FROM
(
SELECT * FROM
(SELECT EMPcode from Test GROUP BY EMPCODE) AS A,
(SELECT Paycode FROM Test GROUP BY PAYCODE) AS B
) AS C
LEFT JOIN Test AS D
ON C.EMPCODE=D.EMPCODE AND C.PAYCODE = D.PAYCODE
UPDATE:
1) To get the distinct EMPCODE from table
(SELECT EMPcode from Test GROUP BY EMPCODE) AS A
2) To get the distinct PAYCODE from table
(SELECT Paycode FROM Test GROUP BY PAYCODE) AS B
3) To get the all PAYCODE value for each Empcode.
SELECT * FROM
(SELECT EMPcode from Test GROUP BY EMPCODE) AS A,
(SELECT Paycode FROM Test GROUP BY PAYCODE) AS B
You can do this by generating all the combinations of empcode and paycode using a cross join in a sub-query that you then use as a derived table for a left join. To not insert already existing rows you should exclude them using a correlated not exists predicate. Written as a stored procedure it could look like this:
create proc insert_missing_values as
insert your_table (empcode, paycode, amount)
select distinct codes.empcode, codes.paycode, isnull(your_table.amount, 0) amount
from (
select t1.empcode, t2.paycode
from your_table t1, your_table t2
group by t1.empcode, t2.paycode
) codes
left join your_table on
codes.empcode = your_table.empcode
and
codes.paycode = your_table.paycode
where not exists (
select 1 from your_table
where codes.empcode = your_table.empcode and codes.paycode = your_table.paycode
)
Sample SQL Fiddle
Edit: as Sybase ASE doesn't support the explicit cross join you can use a unqualified implicit join with the same effect by doing from TableA, TableB which returns the Cartesian product of rows. See this Wikipedia article for an explanation.

SQL - only select results that do not have a specific status value in a corresponding table

I have two tables, and I only want to get the Student IDs where they have perfect attendance for all months (they do not have a PerfectAttendance value of N for any month). These tables will have hundreds of millions of rows, so I was trying to come up with an approach that doesn't require a full separate subquery. If anyone has any recommendations, please let me know:
Table Student:
ID Name
------------
1 A
2 B
Table Attendance:
ID Month PerfectAttendance
---------------------------------
1 1 Y
1 2 Y
1 3 Y
1 4 Y
1 5 Y
1 6 Y
1 7 Y
1 8 Y
1 9 Y
1 10 Y
1 11 Y
1 12 Y
2 1 Y
2 2 Y
2 3 Y
2 4 Y
2 5 Y
2 6 Y
2 7 Y
2 8 Y
2 9 Y
2 10 Y
2 11 Y
2 12 N
SELECT *
FROM dbo.Student S
WHERE NOT EXISTS(SELECT 1 FROM dbo.Attendance
WHERE PerfectAttendance = 'N'
AND ID = S.ID);
My suggestion for this would be to query the table and get the number of months that each student has perfect attendance. Once you've done that, you can filter on the count being 12 (since there are twelve months).
Try this:
SELECT s.id, s.name, COUNT(*) AS numPerfectMonths
FROM student s JOIN attendence a ON s.id = a.id
WHERE a.perfectAttendance = 'Y'
GROUP BY s.id
HAVING COUNT(*) = 12;
Here is the SQL Fiddle for you.
EDIT
I made the assumption you will have 12 rows for each student. However, let's say you ran this in October and you want to see which students have a perfect attendance up to that point. You can use a subquery to pull for students without perfect attendance, and filter them out using NOT IN like so:
SELECT id
FROM student
WHERE id
NOT IN(SELECT s.id
FROM student s JOIN attendance a ON s.id = a.id
WHERE a.perfectAttendance = 'N'
GROUP BY s.id
HAVING COUNT(*) > 0);
Have an updated SQL Fiddle. To test this one, try deleting one of the rows for id number 1, and you'll still see that they are returned with perfect attendance.
Assuming you have 12 records per student in attendance table based on your data , you can do it with GROUP BY and HAVING clause.
SELECT S.ID, S.NAME
FROM Student S
JOIN Attendance A
on S.ID = A.ID
AND A.PerfectAttendance = 'Y'
GROUP BY S.ID, S.NAME
HAVING COUNT(*) = 12
I think Lamak's answer is probably the clearest and best-performing, but here is another variation on the GROUP BY method suggested by others, when you don't specifically look for a total of 12 months:
;WITH PerfectAttendance AS (
SELECT a.id
FROM Attendance a
GROUP BY a.id
HAVING MIN(a.PerfectAttendance) = 'Y'
)
SELECT s.id, s.Name
FROM PerfectAttendance p
JOIN Student s ON p.id = s.id;

Get the max value of a column from set of rows

I have a table like this
Table A:
Id Count
1 4
1 16
1 8
2 10
2 15
3 18
etc
Table B:
1 sample1.file
2 sample2.file
3 sample3.file
TABLE C:
Count fileNumber
16 1234
4 2345
15 3456
18 4567
and so on...
What I want is this
1 sample1.file 1234
2 sample2.file 3456
3 sample3.file 4567
To get the max value from table A I used
Select MAX (Count) from A where Id='1'
This works well but my problem is when combining data with another table.
When I join Table B and Table A, I need to get the MAX for all Ids and in my query I dont know what Id is.
This is my query
SELECT B.*,C.*
JOIN A on A.Id = B.ID
JOIN C on A.id = B.ID
WHERE (SELECT MAX(COUNT)
FROM A
WHERE Id = <what goes here????>)
To summarise, what I want is Values from Table B, FileNumber from Table c (where the count is Max for ID from table A).
UPDATE: COrrecting table C above. Looks like I need Table A.
I think this is the query you're looking for:
select b.*, c.filenumber from b
join (
select id, max(count) as count from a
group by id
) as NewA on b.id = NewA.id
join c on NewA.count = c.count
However, you should take into account that I don't get why for id=1 in tableA you choose the 16 to match against table C (which is the max) and for id=2 in tableA you choose the 10 to match against table C (which is the min). I assumed you meant the max in both cases.
Edit:
I see you've updated tableA data. The query results in this, given the previous data:
+----+---------------+------------+
| ID | FILENAME | FILENUMBER |
+----+---------------+------------+
| 1 | sample1.file | 1234 |
| 2 | sample2.file | 3456 |
| 3 | sample3.file | 4567 |
+----+---------------+------------+
Here is a working example
Using Mosty’s working example (renaming the keyword count to cnt for a column name), this is another approach:
with abc as (
select
a.id,
a.cnt,
rank() over (
partition by a.id
order by cnt desc
) as rk,
b.filename
from a join b on a.id = b.id
)
select
abc.id, abc.filename, c.filenumber
from abc join c
on c.cnt = abc.cnt
where rk = 1;
select
PreMax.ID,
B.FileName,
C2.FileNumber
from
( select C.id, max( C.count ) maxPerID
from TableC C
group by C.ID
order by C.ID ) PreMax
JOIN TableC C2
on PreMax.ID = C2.ID
AND PreMax.maxPerID = C2.Count
JOIN TableB B
on PreMax.ID = B.ID