Multiple Group By and Count - sql

I was wondering if someone could help me with this. Here is a sample set of my data:
FirstName LastName Department Ticket Hours Shift DateWorked Key
Bob Smith Sleeves 23235 4 1 2017-01-01 001
Bob Smith Sleeves 12345 4 1 2017-01-01 001
Jim Bo Sleeves 12345 8 1 2017-01-01 002
Janet Moore Lids 78945 8 2 2017-01-01 003
Jon Bob Lids 45621 1.5 3 2017-01-01 004
Jon Bob Lids 45621 7.5 3 2017-01-01 004
Bob Smith Mugs 12345 8 1 2017-01-02
Jim Bo Lids 99999 8 3 2017-01-02
It should return something like this:
DateWorked Shift Department HeadCount
2017-01-01 1 Sleeves 2 (Bob Smith has two entries but counted as one and Jim Bo makes for 2)
2017-01-01 2 Lids 1 (Janet)
2017-01-01 3 Lids 1 (Jon)
Please note that all departments work all shifts. This is just a sample set. There can be anywhere from none to a hundred per department.
Also one employee could work multiple departments in one day! I don't know how to account for that.
This is what I have. So for this example it's not summing Bob Smith. It's counting him as two.
SELECT Scheduled, Department, [Shift], COUNT(*) as HeadCount
FROM EmployeeTickets
WHERE Scheduled >= '2017-01-01' AND Scheduled < '2017-12-31'
GROUP BY Scheduled, Department, [Shift]
ORDER BY Scheduled, Department, [Shift]
Thank you.
ETA I don't know if it helps but in the table there is a key per entry, so Bob Smith on Jan 1 would have a key for that day. His social security number is also in there. I'm trying to group by one of those somehow.

just use DISTINCT
SELECT Scheduled, Department, [Shift], COUNT( DISTINCT FirstName ) as HeadCount
FROM EmployeeTickets
WHERE Scheduled >= '2017-01-01' AND Scheduled < '2017-12-31'
GROUP BY Scheduled, Department, [Shift]
ORDER BY Scheduled, Department, [Shift]
Of course this will have problem if you have multiple persons with same name. So I hope you have some EmployeeID on your tables, so you can differentiate every employee.
COUNT(DISTINCT EmployeeID)

Related

How do I select a max date by person in a table

I am not too advanced with SSRS/SQL queries, and need to write a report that pulls out % allocations by person to then compare to a wage table to allocate the wages. These allocations change quarterly, but all allocations continue to be stored in the table. If a persons allocation did not change, they do NOT get a new entry in the table. Here is a sample table called Allocations.
First Name
Last Name
Date
Area
Percent
Smith
Bob
01/01/20
A
50.00
Smith
Bob
01/01/20
B
50.00
Doe
Jane
01/01/20
A
25.00
Doe
Jane
01/01/20
B
25.00
Doe
Jane
01/01/20
C
50.00
Doe
Jane
04/01/20
A
35.00
Doe
Jane
04/01/20
C
65.00
Wayne
Bruce
01/01/20
A
100.00
Wayne
Bruce
04/01/20
B
100.00
The results that I would want to have from this sample table when querying it are:
First Name
Last Name
Date
Area
Percent
Smith
Bob
01/01/20
A
50.00
Smith
Bob
01/01/20
B
50.00
Doe
Jane
04/01/20
A
35.00
Doe
Jane
04/01/20
C
65.00
Wayne
Bruce
04/01/20
B
100.00
However, I would also like to pull this by comparing it to a date that the user inputs, so that they could run this report at any point in time and get the correct "max" dates. So, for example, if there were also 7/1/20 dates in here, but the user input date was 6/30/20, I would NOT want to pull the 7/1/20 data. In other words, I would like to pull the rows with the maximum date by name w/o going over the user's input date.
Any idea on the best way to accomplish this?
Thanks in advance for any advice you can provide.
In SQL, ROW_NUMBER can be used to order records in groups by a particular field.
SELECT * FROM (
SELECT *, ROW_NUMBER()OVER(PARTITION BY Last_Name, First_Name ORDER BY DATE DESC) as ROW_NUM
FROM TABLE
) AS T
WHERE ROW_NUM = 1
Then you filter for ROW_NUM = 1.
However, I noticed that there are a couple with the same date and you want both. In this caseyou'd want to use RANK - which allows for ties so there may be multiple records with the same date that you want to capture.
SELECT * FROM (
SELECT *, RANK()OVER(PARTITION BY Last_Name, First_Name ORDER BY DATE DESC) as ROW_NUM
FROM TABLE
) AS T
WHERE ROW_NUM = 1

How do I display rows from a max count?

I want to return all the data, from max count query with hospital that has most number of patients. What I seem to be getting when I try to nest queries is display of all rows of hospital data. I've tried to look at similar questions in stack overflow and other sites it seems to be a simple query to do but i am not getting it.
select max(highest_hospital) as max_hospital
from (select count(hospital) as highest_hospital
from doctor
group by hospital)
highest_hospital
-------------
3
Doc ID Doctor Patient Hospital Medicine Cost
------ ------- ------ --------- ------ --------
1 Jim Bob Patient1 Town 1 Medicine 1 4000
2 Janice Smith Patient2 Town 2 Medicine 3 3000
3 Harold Brown Patient3 Town 2 Medicine 5 2000
4 Larry Owens Patient4 Town 2 Medicine 6 3000
5 Sally Brown Patient5 Town 3 Medicine 7 4000
6 Bob Jim Patient6 Town 4 Medicine 8 6000
Outcome should be return of 3 rows
Doc ID Doctor Patient Hospital Medicine Cost
------ ------- ------ --------- ------ --------
2 Janice Smith Patient2 Town 2 Medicine 3 3000
3 Harold Brown Patient3 Town 2 Medicine 5 2000
4 Larry Owens Patient4 Town 2 Medicine 6 3000
You can use window functions:
select d.*
from (select d.*, max(hospital_count) over () as max_hospital_count
from (select d.*, count(*) over (partition by hospital) as hospital_count
from doctor d
) d
) d
where hospital_count = max_hospital_count;
Edit:
Using GROUP BY is a pain. If you are only looking for a single hospital (even when there are ties), then in Oracle 12C you can do:
select d.*
from doctor d
where d.hospital = (select d2.hospital
from doctor d2
group by d2.hospital
order by count(*) desc
fetch first 1 row only
);
You can do this in earlier versions of Oracle using an additional subquery.

Is there a way to list the most recent dates for an event based on data in other columns?

I am working to write a query that shows the most recent job start date for each person with extended families with in the past year (I should not show future dates) It is possible that multiple families (in multiple states) may have started their job on the same date. In that case, I need to list the state(s), both people, and the respective dates. However, I should only list each state/person pair once.
Additionally, if the person didn't start their job within the past year, I should still list the persons name but in the place of the state name, I should have the query return NULL and the date return NULL.
Below is the date in the raw table:
LOC FAM PPL MILESTONE_ID MILESTONE_NAME START_DATE
WI Smith Mike 1 End College 9/4/2017 0:00
WI Smith Mike 2 Start Job 9/4/2017 0:00
WI Smith Bob 1 End College 6/4/2019
WI Smith Bob 2 Start Job 6/4/2019
IL Thomas Mike 1 End College 1/4/2019
IL Thomas Mike 2 Start Job 6/4/2019
IL Thomas Bob 1 End College 12/4/2019
IL Thomas Bob 2 Start Job 6/4/2019
I know that I need to use a subquery to get the most recent job start dates but my subquery isn't behaving as expected. I have also tried using a CTE but that isn't working either.
This is what I have so far. I haven't gotten the subquery to work correctly. I still need to add the NULL portion of the situation above
Select family.*
From
FAMILY.KEYINFO as family
Inner Join
(Select family.milestone_id, MAX(family.start_date) as LatestDate
from FAMILY.keyinfo
group by milestone_id) groupeddate
on family.milestone_id=groupeddate.milestone
where family.start_date<= CURRENT_TIMESTAMP
and family.start_date > DATEADD(year,-1,GETDATE())
Below is what I would expect the answer to be if the query was correct:
LOC PPL START_DATE
N/A Mike N/A
N/A Mike N/A
WI Bob 6/4/2019
IL Mike 6/4/2019
IL Bob 6/4/2019
You seem to want window functions:
select f.*
from (select f.*,
rank() over (partition by fam order by start_date desc as seqnum
from families f
where milestone_name = 'Start Job'
) f
where seqnum = 1;

Select multiple distincts with group by (I think)

I was wondering if someone could help me with this. Here is a sample set of my data:
FirstName LastName Department Comment DateWorked
Bob Smith Sleeves 2017-01-01
Jim Bo Sleeves 2017-01-01
Janet Moore Lids No Show 2017-01-01
Jon Bob Lids 2017-01-01
Bob Smith Mugs 2017-01-02
Bob Smith Sleeves 2017-01-03
Jim Bo Sleeves 2017-01-03
Janet Moore Lids 2017-01-03
Jon Bob Lids 2017-01-03
It should return something like this:
DateWorked Department HeadCount
2017-01-01 Sleeves 2
2017-01-01 Lids 2
2017-01-02 Mugs 1
2017-01-03 Sleeves 2
2017-01-03 Lids 2
So far I've tried a few things.
This is what I want but it's not working
SELECT Count(Distinct(FirstName, LastName, Department, Scheduled), Notes) FROM Employees
Where Scheduled < 20171231
and Scheduled > 20170101
Group by Scheduled, Department, FirstName, LastName, Department, Comment
This just gives me a number.
select count(*)
from
(select count(*) CT
from Employees
group by Scheduled, Department) TD
This errors out.
SELECT COUNT(DISTINCT FirstName, LastName, Department) FROM Employees
Can anyone help?
Thanks
You would appear to want:
SELECT DateWorked, Department, COUNT(*) as HeadCount
FROM Employees
WHERE Scheduled < 20171231 AND Scheduled > 20170101
GROUP BY DateWorked, Department
ORDER BY DateWorked, Department;
The above keeps the date comparisons as you have in your query, although they seem wrong.
I would recommend writing the date comparisons as:
SELECT DateWorked, Department, COUNT(*) as HeadCount
FROM Employees
WHERE Scheduled >= '2017-01-01' AND Scheduled < '2017-12-31'
GROUP BY DateWorked, Department
ORDER BY DateWorked, Department;
This fixes the date comparisons, to be more aligned with your desired results.
You don't specify your database. YYYY-MM-DD is the ISO standard date format and works across most databases.
SELECT DateWorked, Department, COUNT(*) AS HeadCount
FROM Employees
WHERE Scheduled < 20171231 AND Scheduled > 20170101
GROUP BY Department, DateWorked

Problem with GROUP BY statement (SQL)

I have a table GAMES with this information:
Id_Game Id_Player1 Id_Player2 Week
--------------------------------------
1211 Peter John 2
1215 John Louis 13
1216 Louis Peter 17
I would like to get a list of the last week when each player has played, and the number of games, which should be this:
Id_Player Week numberGames
-----------------------------
Peter 17 2
John 13 2
Louis 17 2
But instead I get this one (notice on Peter week):
Id_Player Week numberGames
-----------------------------
Peter 2 2
John 13 2
Louis 17 2
What I do is this:
SELECT Id_Player,
MAX(Week) AS Week,
COUNT(*) as numberGames
FROM ((SELECT Id_Player1 as Id_Player, Week
FROM Games)
UNION ALL
(SELECT Id_Player2 as Id_Player, Week
FROM Games)) AS g2
GROUP BY Id_Player;
Could anyone help me to find the mistake?
What is the datatype of the Week column? If the datatype of Week is varchar you would get this behavior.