Finding a sql query to get the latest associated date for each grouping - sql

I have a sql table of payroll data that has wage rates and effective dates associated with those wage rates, as well as hours worked on various dates. It looks somewhat like this:
EMPID DateWorked Hours WageRate EffectiveDate
1 1/1/2010 10 7.00 6/1/2009
1 1/1/2010 10 7.25 6/10/2009
1 1/1/2010 10 8.00 2/1/2010
1 1/10/2010 ...
2 1/1/2010 ...
...
And so on. Basically, the data has been combined in such a way that for every day worked, all of the employee's wage history is joined together, and I want to grab the wage rate associated with the LATEST effective date that is not later than the date worked. So in the example above, the rate of 7.25 that become effective on 6/10/2009 is what I want.
What kind of query can I put together for this? I can use MAX(EffectiveDate) alongwith a criteria based on being before the work date, but that only gives me the latest date itself, I want the associated wage. I am using Sql Server for this.
Alternatively, I have the original tables that were used to create this data. One of them contains the dates worked, and the hours as well as EMPID, the other contains the list of wage rates and effective dates. Is there a way to join these instead that would correctly apply the right wage rate for each work day?
I was thinking that I'd want to group by EMPID and then DateWorked, and do something from there. I want to get a result that gives me the wage rate that actually is the latest effective rate for each date worked

select p.*
from (
select EMPID, DateWorked, Max(EffectiveDate) as MaxEffectiveDate
from Payroll
where EffectiveDate <= DateWorked
group by EMPID, DateWorked
) pm
inner join Payroll p on pm.EMPID = p.EMPID and pm.DateWorked = p.DateWorked and pm.MaxEffectiveDate = p.EffectiveDate
Output:
EMPID DateWorked Hours WageRate EffectiveDate
----------- ----------------------- ----------- --------------------------------------- -----------------------
1 2010-01-01 00:00:00.000 10 7.25 2009-06-10 00:00:00.000

try this:
DECLARE #YourTable table (EMPID int, DateWorked datetime, Hours int
,WageRate numeric(6,2), EffectiveDate datetime)
INSERT INTO #YourTable VALUES (1,'1/1/2010' ,10, 7.00, '6/1/2009')
INSERT INTO #YourTable VALUES (1,'1/1/2010' ,10, 7.25, '6/10/2009')
INSERT INTO #YourTable VALUES (1,'1/1/2010' ,10, 8.00, '2/1/2010')
INSERT INTO #YourTable VALUES (1,'1/10/2010',10, 20.00,'12/1/2010')
INSERT INTO #YourTable VALUES (2,'1/1/2010' ,8 , 12.00, '2/1/2009')
SELECT
e.EMPID,e.WageRate,e.EffectiveDate
FROM #YourTable e
INNER JOIN (SELECT
EMPID,MAX(EffectiveDate) AS EffectiveDate
FROM #YourTable
WHERE EffectiveDate<GETDATE()+1
GROUP BY EMPID
) dt ON e.EMPID=dt.EMPID AND e.EffectiveDate=dt.EffectiveDate
ORDER BY e.EMPID
OUTPUT
EMPID WageRate EffectiveDate
----------- --------------------------------------- -----------------------
1 8.00 2010-02-01 00:00:00.000
2 12.00 2009-02-01 00:00:00.000
(2 row(s) affected)

Something like this ought to work:
SELECT T.* FROM T
INNER JOIN (
SELECT EMPID, MAX(EFFECTIVEDATE) EFFECTIVEDATE
FROM T
WHERE DATEWORKED <= EFFECTIVEDATE
GROUP BY EMPID) t2
ON T2.EMPID = T.EMPID
AND T2.EFFECTIVEDATE = T.EFFECTIVEDATE

SELECT TOP 1 EMPID, WageRate
FROM wages
WHERE ......
ORDER BY EffectiveDate DESC

Related

SQL COUNT the number purchase between his first purchase and the follow 10 months

every customer has different first-time purchase date, I want to COUNT the number of purchases they have between the following 10 months after the first purchase?
sample table
TransactionID Client_name PurchaseDate Revenue
11 John Lee 10/13/2014 327
12 John Lee 9/15/2015 873
13 John Lee 11/29/2015 1,938
14 Rebort Jo 8/18/2013 722
15 Rebort Jo 5/21/2014 525
16 Rebort Jo 2/4/2015 455
17 Rebort Jo 3/20/2016 599
18 Tina Pe 10/8/2014 213
19 Tina Pe 6/10/2016 3,494
20 Tina Pe 8/9/2016 411
my code below just use ROW_NUM function to identify the first purchase, but I don't know how to do the calculations or there's a better way to do it?
SELECT client_name,
purchasedate,
Dateadd(month, 10, purchasedate) TenMonth,
Row_number()
OVER (
partition BY client_name
ORDER BY client_name) RM
FROM mytable
You might try something like this - I assume you're using SQL Server from the presence of DATEADD() and the fact that you're using a window function (ROW_NUMBER()):
WITH myCTE AS (
SELECT TransactionID, Client_name, PurchaseDate, Revenue
, MIN(PurchaseDate) OVER ( PARTITION BY Client_name ) AS min_PurchaseDate
FROM myTable
)
SELECT Client_name, COUNT(*)
FROM myCTE
WHERE PurchaseDate <= DATEADD(month, 10, min_PurchaseDate)
GROUP BY Client_name
Here I'm creating a common table expression (CTE) with all the data, including the date of first purchase, then I grab a count of all the purchases within a 10-month timeframe.
Hope this helps.
Give this a whirl ... Subquery to get the min purchase date, then LEFT JOIN to the main table to have a WHERE clause for the ten month date range, then count.
SELECT Client_name, COUNT(mt.PurchaseDate) as PurchaseCountFirstTenMonths
FROM myTable mt
LEFT JOIN (
SELECT Client_name, MIN(PurchaseDate) as MinPurchaseDate GROUP BY Client_name) mtmin
ON mt.Client_name = mtmin.Client_name AND mt.PurchaseDate = mtmin.MinPurchaseDate
WHERE mt.PurchaseDate >= mtmin.MinPurchaseDate AND mt.PurchaseDate <= DATEADD(month, 10, mtmin.MinPurchaseDate)
GROUP BY Client_name
ORDER BY Client_name
btw I'm guessing there's some kind of ClientID involved, as nine character full name runs the risk of duplicates.

How to sum the hours using two Date Fields and group them by the user id in SQL

I feel like the task is straight forward but I am having hard time getting it to do what I want.
Here is a table in my database:
ID |Empl_Acc_ID |CheckIn |CheckOut |WeekDay
----------------------------------------------------------------------------
1 | 1 | 2017-09-24 08:03:02.143 | 2017-09-24 12:00:00.180 | Sun
2 | 1 | 2017-09-24 13:02:23.457 | 2017-09-24 17:01:02.640 | Sun
3 | 2 | 2017-09-24 08:05:23.457 | 2017-09-24 13:01:02.640 | Mon
4 | 2 | 2017-09-24 14:05:23.457 | 2017-09-24 17:00:02.640 | Mon
5 | 3 | 2017-09-24 07:05:23.457 | 2017-09-24 11:30:02.640 | Tue
6 | 3 | 2017-09-24 12:31:23.457 | 2017-09-24 16:01:02.640 | Tue
and so on....
I want to group Empl_Acc_ID by the same date and sum up the total hours each employee worked that day. Each employee could have either one or more records per day depending on how many breaks he/she took that day.
For example if Empl_Acc_ID (2) worked 3 different days with one break, the table will contain 6 records for that person but in my query I want to see 3 records with the total hours they worked each day.
Here is how I constructed the query:
select distinct w.Empl_Acc_ID, ws.fullWorkDayHours
from Work_Schedule as w
INNER JOIN (
SELECT Empl_Acc_ID, fullWorkDayHours = Sum(DATEDIFF(hour, w.CheckIn, w.CheckOut))
from Work_Schedule w
GROUP BY Empl_Acc_ID
) ws on w.Empl_Acc_ID = ws.Empl_Acc_ID
This query does not quite get me what I need. It only returns the sum of hours per employee for all the days they worked. Also, this query only has 2 columns but I want to see more columns. when I tried adding more columns, the records no longer are distinct by Empl_Acc_ID.
What is wrong with the query?
Thank you
You do not need self-join this table in that case, just group by casting the datetime field to date.
create table Work_Schedule (
ID TINYINT,
Empl_Acc_ID TINYINT,
CheckIn DATETIME,
CheckOut DATETIME,
WeekDay CHAR(3)
);
INSERT INTO Work_Schedule VALUES (1, 1,'2017-09-24 08:03:02.143','2017-09-24 12:00:00.180','Sun');
INSERT INTO Work_Schedule VALUES (2, 1,'2017-09-24 13:02:23.457','2017-09-24 17:01:02.640','Sun');
INSERT INTO Work_Schedule VALUES (3, 2,'2017-09-24 08:05:23.457','2017-09-24 13:01:02.640','Mon');
INSERT INTO Work_Schedule VALUES (4, 2,'2017-09-24 14:05:23.457','2017-09-24 17:00:02.640','Mon');
INSERT INTO Work_Schedule VALUES (5, 3,'2017-09-24 07:05:23.457','2017-09-24 11:30:02.640','Tue');
INSERT INTO Work_Schedule VALUES (6, 3,'2017-09-24 12:31:23.457','2017-09-24 16:01:02.640','Tue');
SELECT w.Empl_Acc_ID,
CAST(CheckIn AS DATE) [date],
SUM(DATEDIFF(hour, w.CheckIn, w.CheckOut)) fullWorkDayHours
FROM Work_Schedule w
GROUP BY w.Empl_Acc_ID, CAST(CheckIn AS DATE)
DROP TABLE Work_Schedule;
Empl_Acc_ID date fullWorkDayHours
1 2017-09-24 8
2 2017-09-24 8
3 2017-09-24 8
Try this. You just have to group by date and employee account.
select Employee.Empl_Acc_ID, FirstName, LastName, Username,
convert(varchar(10), checkin, 101) as checkin, convert(varchar(10),
checkout, 101) as checkout, sum(datediff(hour, checkin, checkout)) as hours
from Employee
inner join Employee_Account on Employee.Empl_Acc_ID =
Employee_Account.Empl_Acc_ID
inner join Work_Schedule on Employee_Account.Empl_Acc_ID =
Work_Schedule.Empl_Acc_ID
group by convert(varchar(10), checkin, 101), convert(varchar(10), checkout,
101), Employee.Empl_Acc_ID, FirstName, LastName, Username
order by Employee.Empl_Acc_ID
You do not group by date, that's the issue:
SELECT DISTINCT w.Empl_Acc_ID, ws.fullWorkDayHours, ws.CheckInDate
FROM Work_Schedule as w
INNER JOIN (
SELECT Empl_Acc_ID, CAST(w.CheckIn AS DATE) AS [CheckInDate], fullWorkDayHours = Sum(DATEDIFF(hour,
w.CheckIn, w.CheckOut))
from Work_Schedule w
GROUP BY Empl_Acc_ID, CAST(w.CheckIn AS DATE)
) ws on w.Empl_Acc_ID = ws.Empl_Acc_ID
No need of doing self join, it works fine without it:
Select distinct Empl_Acc_ID, Sum(DATEDIFF(hour,CheckIN,CheckOut)) As
FullDayWorkHours from EMP2
where DATEPART(day,CheckIn)=DATEPART(day,CheckOut)
Group By Empl_Acc_ID

Sql Server group by sets of columns

I have a data set where I need to count patient visits with such rules:
Two or more visits to the same doctor in the same day count as 1 visit, regardless of the reason
Two or more visits to different doctors for the same reason count as 1 visit
Two or more visits to different doctors on the same day for different reasons count as two or more visits.
Example data:
DoctorId PatientId VisitDate ReasonCode RowId
-------- --------- --------- ---------- -----
1 100 2014-01-01 200 1
1 100 2014-01-01 210 2
2 100 2014-01-01 200 3
2 100 2014-01-11 300 4
1 100 2014-01-15 200 5
2 400 2014-01-15 200 6
In this example, my final count would be based on grouping rowId 1, 2, 3 for 1 visit; grouping row 4 as 1 visit, grouping row 5 as 1 visit for a total of 3 visits for patient 100. Patient 400 has 1 visit as well.
patientid visitdate numberofvisits
--------- --------- --------------
100 2014-01-01 3
100 2014-01-11 1
100 2014-01-15 1
400 2014-01-15 1
Where I'm stuck is how to handle the group by so that I get the different scenarios covered. If the grouping were doctor, date, I'd be fine. If it were doctor, date, ReasonCode, I'd be fine. It's the logic of the doctorId and the ReasonCode in the scenario where 2 doctors are involved, and doctorid and date in the other when it's the same doctor. I've not been deeply into Sql Server in a long time, so it's possible that a common table expression is the solution and I'm not seeing it. I'm using Sql Server 2014 and there's a decent lattitude in performance. I would be looking for a sql server query that produces the results above. As best I can tell, there's no way to group this the way I need it counted.
The answer was an except clause and grouping each of the sets before a final count. Sometimes, we over-complicate things.
DECLARE #tblAllData TABLE
(
DoctorId INT NOT NULL
, PatientId INT NOT NULL
, VisitDate DATE NOT NULL
, ReasonCode INT NOT NULL
, RowId INT NOT NULL
)
INSERT #tblAllData
SELECT
1,100,'2014-01-01',200,1
UNION ALL
SELECT
1,100,'2014-01-01',210,2
UNION ALL
SELECT
2,100,'2014-01-01',200,3
UNION ALL
SELECT
2,100,'2014-01-11',300,4
UNION ALL
SELECT
1,100,'2014-01-15',200,5
UNION ALL
SELECT
2,400,'2014-01-15',200,6
DECLARE #tblTempCountedRows AS TABLE
(
PatientId INT NOT NULL
, VisitDate DATE
, ReasonCode INT
)
INSERT #tblTempCountedRows
SELECT PatientId, VisitDate,0
FROM #tblAllData
GROUP BY PatientId, DoctorId, VisitDate
EXCEPT
SELECT PatientId, VisitDate, ReasonCode
FROM #tblAllData
GROUP BY PatientId, VisitDate, ReasonCode
select * from #tblTempCountedRows
DECLARE #tblFinalCountedRows AS TABLE
(
PatientId INT NOT NULL
, VisitCount INT
)
INSERT #tblFinalCountedRows
SELECT
PatientId
, count(1) as Member_visit_Count
FROM
#tblTempCountedRows
GROUP BY PatientId
SELECT * from #tblFinalCountedRows
Here's a Sql Fiddle with the results:
Sql Fiddle

Multiple rows of dates between using custom calendar

So banging my head against the wall and can't see the wood for the trees...
I've got two tables;
1. ID field, start date and end date columns.
2. Date and Workday columns.
I just need to be able to count the days between the two for each row using this dates on the second calendar. Googl'ing had found plenty of examples without the dates table and plenty of examples where its just based on 1 start and end date.
Table_1 - Contains an entry for every id
id start_date end_date
123 01/01/2013 03/01/2013
456 02/01/2013 08/01/2013
789 06/01/2013 07/01/2013
Table_2 - Contains an entry for everyday
e_day workday
01/01/2013 1
02/01/2013 0
03/01/2013 1
04/01/2013 1
05/01/2013 0
06/01/2013 1
07/01/2013 0
08/01/2013 0
Results
id start_date end_date days_between
123 01/01/2013 03/01/2013 2
456 02/01/2013 08/01/2013 3
789 06/01/2013 07/01/2013 1
I can find out the value for 1 id;
SELECT COUNT(workday) FROM table_2
WHERE workday = 1 AND cal_day >= '01/01/2013'
AND cal_day <= '03/01/2013';
Just not sure how to put this logic in to table_1.
IE (Clearly not correct)
SELECT
table_1.id,
table_1.start_date,
table_1.end_date,
(COUNT(table_2.workday) FROM table_2 WHERE table_2.workday = 1
AND table_2.e_day >= table_1.start_date
AND table_2.e_day <= table_2.end_date) AS days_between
FROM table_1
Code to generate bodged example tables;
CREATE TABLE #table_1(id INT, start_date SMALLDATETIME, end_date SMALLDATETIME);
CREATE TABLE #table_2(e_day SMALLDATETIME, workday BIT);
INSERT #table_1 VALUES (123,'01/01/2013','03/01/2013')
INSERT #table_1 VALUES (456,'02/01/2013','08/01/2013')
INSERT #table_1 VALUES (789,'06/01/2013','07/01/2013')
INSERT #table_2 VALUES ('01/01/2013',1)
INSERT #table_2 VALUES ('02/01/2013',0)
INSERT #table_2 VALUES ('03/01/2013',1)
INSERT #table_2 VALUES ('04/01/2013',1)
INSERT #table_2 VALUES ('05/01/2013',0)
INSERT #table_2 VALUES ('06/01/2013',1)
INSERT #table_2 VALUES ('07/01/2013',0)
INSERT #table_2 VALUES ('08/01/2013',0)
SELECT * FROM #table_1
SELECT * FROM #table_2
Code to remove tables;
DROP TABLE #table_1 DROP TABLE #table_2;
Thanks all for you help in advance :)
Try this:
select a.id,a.start_date,a.end_date,sum(cast(workday as tinyint)) as NumWorkDays,
count(*) as Total_days
from idTable a
join workdaytable b on b.eday between a.start_date and a.end_Date
group by a.id,a.start_date,a.end_date
To visualize what is happening
select a.id,a.start_date,a.end_date
where id=123
id start_date end_date
123 1/1/2013 3/1/2013
returns one row for id=123
Now, when we do the join, we add e_day and the workday flag columns AND we add one row for each e_day in the second table
id start_date end_date e_day work_day
123 1/1/2013 3/1/2013 1/1/2013 0
123 1/1/2013 3/1/2013 1/2/2013 1
123 1/1/2013 3/1/2013 1/3/2013 1
etc.
Now we had a big "table" with 5 columns and one row for each day in the second table that falls between 1/1/2013 and 3/1/2013. The Sum operation simply adds all of the work_day flag from the "table" we created by the join. If you run the query without the JOIN (and remove the sum and count), you can see the "table" that gets created...
Hope this helps a bit...

SQL join two record into one row with multiple column

i want to join two record (from same table) into one row with multiple column.
employment history structure as follows:
StaffID StartDate EndDate DeptID
==================================================
1 2010-10-01 2011-01-19 1
1 2011-01-20 2012-12-31 2
1 2013-01-01 2013-05-29 4
how can i join the two rows into one row if same StaffID and the 2nd record startdate is 1 day after the enddate of 1st record (continuous employment)
the output should like this
StaffID EffectiveDate New_DeptID Prev_DeptID
==================================================
1 2011-01-20 2 1
1 2013-01-01 4 2
the following is my sql statement but it doesn't work
select distinct
ca1.StaffID,
ca1.ProjectDepartment as Prev_DeptID, ca1.StartDate, ca1.EndDate,
ca2.ProjectDepartment as New_DeptID, ca2.StartDate, ca2.EndDate
from
emp_hist as ca1,
emp_hist as ca2
where
(ca1.StaffID = ca2.StaffID)
and ca1.StartDate<>ca2.StartDate
and ca1.EndDate <>ca2.EndDate
and ca2.startdate= DATEADD(day, 1, ca1.enddate)
for example,
two records (true data) in the table:
StaffID StartDate EndDate DeptID
===========================================================================
1 2010-04-12 12:00:00.000 2013-02-28 00:00:00.000 1
1 2013-03-01 12:00:00.000 2013-08-29 11:02:59.877 2
i cannot retrieve this record by using my sql statement
Your problem is that the dates have a time component. You appear to be using SQL Server. You can fix your query by doing this:
select ca1.StaffID,
ca1.ProjectDepartment as Prev_DeptID, ca1.StartDate, ca1.EndDate,
ca2.ProjectDepartment as New_DeptID, ca2.StartDate, ca2.EndDate
from emp_hist as ca1 join
emp_hist as ca2
on ca1.StaffID = ca2.StaffID and
cast(ca1.StartDate as date) <> cast(ca2.StartDate as date) and
cast(ca1.EndDate as date) <> cast(ca2.EndDate as date) and
cast(ca2.startdate as date) = DATEADD(day, 1, cast(ca1.enddate as date));
I also replaced the implicit join with improved join syntax.
If you're using SQL 2012 try the lag functions.
select distinct
ca1.StaffID,
ca1.EndDate,
ca1.ProjectDepartment as New_DeptID,
LAG(ca1.ProjectDepartment) OVER (PARTITION BY ca1.StaffId ORDER BY ca1.EndDate) as Prev_DeptID
from
emp_hist as ca1
If you're not, use the RANK function and a subquery
select
eh.StaffID,
eh.EndDate,
eh.ProjectDepartment as New_DeptID,
eh1.ProjectDepartment as Prev_DeptID
from
(select *, RANK(EndDate) OVER (PARTITION BY StaffId ORDER BY EndDate) as Rank
from emp_hist) eh left join (
select distinct
StaffID,
EndDate,
ProjectDepartment,
RANK(EndDate) OVER (PARTITION BY StaffId ORDER BY EndDate) as Rank
from
emp_hist) eh1 on eh1.staffid=a.staffid and eh1.rank=eh.rank-1