LAG function not returning desired results

LAG function not returning desired results - sql

I have this table:
CREATE TABLE [dbo].[testtable]
(
[EmpID] [int] NOT NULL,
[Status] [nvarchar](5) NOT NULL,
[History] [nvarchar](5) NOT NULL,
[EntryDate] DateTime NOT NULL
)
INSERT INTO [dbo].[testtable] ([EmpID], [Status], [History], EntryDate)
VALUES (1, 'N', 'OLD', '2022-03-01 13:00'),
(1, 'C', 'OLD', '2022-03-01 16:00'),
(1, 'C', 'OLD', '2022-04-01 16:00'),
(1, 'T', 'CUR', '2022-05-01 08:00'),
(2, 'N', 'OLD', '2022-04-01 16:00'),
(2, 'R', 'OLD', '2022-05-01 07:00'),
(2, 'F', 'OLD', '2022-06-01 15:00'),
(2, 'S', 'CUR', '2022-07-01 14:00'),
(3, 'N', 'CUR', '2022-03-01 17:00'),
(4, 'N', 'OLD', '2022-05-01 16:00'),
(4, 'F', 'OLD', '2022-06-01 11:00'),
(4, 'G', 'OLD', '2022-07-01 20:00'),
(4, 'G', 'CUR', '2022-08-01 19:00')
In my current output, it seems the beginning first record of a different ID selects the prior status of the previous EMPID.EMPID3 will not be included since they have no change
SELECT
EMPID, FromSt, ToSt, History
FROM
(SELECT
EMPID,
ISNULL(LAG(Status) OVER (ORDER BY EMPID ASC), 'N') AS FromSt,
 Status AS ToSt,
History
FROM
[dbo].[testable]
-- WHERE History = 'CUR'
) InnerQuery
WHERE
FromSt <> ToSt
Output:
EMPID FromSt ToSt
---------------------
1 N C
1 C T
2 T N
2 N R
2 R F
2 F S
3 S N
4 N F
4 F G
where each EmpID will go through various status changes. The Oldest per EMPID records will always have a Value of STATUS 'N' and OLD in History and the latest record will always have a History value of 'CUR'
Scenario #1
I would like the output to show only when there is a change between records as below when I select all records
EmpID FromSt ToSt
-------------------
1 N C
1 C T
2 N R
2 R F
2 F S
4 N G
Scenario #2
If I only select 'CUR' I want the output to choose the most current status that is different from the current one and where there is a status change. So again EMPID3 would not be included
EmpID FromSt ToSt
-----------------------
1 C T
2 F S
2 R S
4 F G

There are several problems with the code.
In order to evaluate LAG separately for each EMPID, you need to include the PARTITION BY clause in the LAG function.
https://learn.microsoft.com/en-us/sql/t-sql/functions/lag-transact-sql?view=sql-server-ver16
You need to chose what order to sort the data for LAG to use. There doesn't appear to be anything in your data (a date, perhaps) that would enable you to choose the order. Without this, the results won't be consistent as SQL Server doesn't guarantee the the results are returned in any particular order unless you tell it to. Plus LAG requires an ORDER BY clause.

Thanks to dougp advice. here is the solution
SELECT
EMPID,FromSt,ToSt,History
FROM (SELECT EMPID,
ISNULL(LAG(Status) OVER(PARTITION BY Empid ORDER BY EmpID,EntryDate ),'N')As FromSt,  
Status AS ToSt,
History
FROM [dbo].[testtable]
--WHERE History='CUR'
) InnerQuery
WHERE FromSt<>ToSt

Related

How to write a running total based on criteria in T-SQL

I'm building a report which gives me the total count of unique accounts within a calendar month.
However, this total is based on the number of active accounts (accounts subscribed to a service), and once their contract ends they will be excluded from the total count.
For example, Company A has subscribed to the service on 1/1/2018 and their contract ends on 1/1/2020. So Company A should be included in the total count of unique accounts for all the months their under contract until their contract ends.
End Result would look something like this:
Here is the SQl query that I have so far. How can I write the code such that it will give me this cumulative/running total. I added the columns for reference.
SELECT A.Name, CA.Name, CA.Start_Date__c, CA.End_Date__c, CA.Product_Code_CPQ__c
FROM [salesforce].[Client_Asset__c] AS CA
INNER JOIN salesforce.Account AS A
ON CA.Account__c = A.Id
WHERE Product_Code_CPQ__c IN(
'DSWPSTRSUB','DSWPESSSUB','DSWPPROSUB','DSWPHOSTSUB','DSWPMULTIHOSTSUB','DSWPOLXWRAPFPE',
'DSWPOLXWRAPSUB','WPCALENDARFORALT','WPCALHOSTINGBUN','IMWPTM','SBWPRET','SBWPRETNR','WORDPLUMWEBSUCCESS',
'WORDPWEBSUCCESS','WORDPOGS','FDSTRWORDPDESGNSUB','FDWPFPE','WORDPEMERGHOST','WORDPSUBBUN','WPOLXPLUGIN',
'POSTSTARTWORDPAF','POSTWORDPSTARTBUN','LUMWORDPSSUBBUN','WORDPLUMOGS','LUMFDSTRWPDESGNSUB',
'LUMPSTWORDPSTRBUN','LUMPOSTSTRTWORDPAF','FDWPEMERGFPE')
AND End_Date__c > GETDATE()
AND Active__c = 1

Try something like that:
CREATE TABLE #tmp ([month] INT, [group] VARCHAR(10), [value] REAL)
INSERT INTO #tmp ([month], [group], [value]) VALUES
(1, 'A', 1), (2, 'A', 5), (3, 'A', 3), (4, 'A', 2), (5, 'A', 8),
(1, 'B', 7), (2, 'B', 3), (3, 'B', 2), (4, 'B', 4), (5, 'B', 6)
SELECT c.[month], c.[group], c.current_total, r.running_total
FROM
(
SELECT [month],[group], SUM([value]) current_total
FROM #tmp
GROUP BY [month],[group]
) C JOIN
(
SELECT [month],[group], SUM([value]) OVER (partition BY [group] ORDER BY [month]) running_total
FROM #tmp
) R ON C.[month]=R.[month] AND C.[group]=R.[group]
ORDER BY 2,1
Tested on mssql 2016. Handle potential missing values yourself.

Select duplicate persons with duplicate memberships

SQL Fiddle with schema and my intial attempt.
CREATE TABLE person
([firstname] varchar(10), [surname] varchar(10), [dob] date, [personid] int);
INSERT INTO person
([firstname], [surname], [dob] ,[personid])
VALUES
('Alice', 'AA', '1/1/1990', 1),
('Alice', 'AA', '1/1/1990', 2),
('Bob' , 'BB', '1/1/1990', 3),
('Carol', 'CC', '1/1/1990', 4),
('Alice', 'AA', '1/1/1990', 5),
('Kate' , 'KK', '1/1/1990', 6),
('Kate' , 'KK', '1/1/1990', 7)
;
CREATE TABLE person_membership
([personid] int, [personstatus] varchar(1), [memberid] int);
INSERT INTO person_membership
([personid], [personstatus], [memberid])
VALUES
(1, 'A', 10),
(2, 'A', 20),
(3, 'A', 30),
(3, 'A', 40),
(4, 'A', 50),
(4, 'A', 60),
(5, 'T', 70),
(6, 'A', 80),
(7, 'A', 90);
CREATE TABLE membership
([membershipid] int, [memstatus] varchar(1));
INSERT INTO membership
([membershipid], [memstatus])
VALUES
(10, 'A'),
(20, 'A'),
(30, 'A'),
(40, 'A'),
(50, 'T'),
(60, 'A'),
(70, 'A'),
(80, 'A'),
(90, 'T');
There are three tables (as per the fiddle above). Person table contains duplicates, same people entered more than once, for the purpose of this exercise we assume that a combination of the first name, surname and DoB is enough to uniquely identify a person.
I am trying to build a query which will show duplicates of people (first name+surname+Dob) with two or more active entries in the Person table (person_membership.person_status=A) AND two or more active memberships (membership.mestatus=A).
Using the example from SQL Fiddle, the result of the query should be just Alice (two active person IDs, two active membership IDs).
I think I'm making progress with the following effort but it looks rather cumbersome and I need to remove Katie from the final result - she doesn't have a duplicate membership.
SELECT q.firstname, q.surname, q.dob, p1.personid, m.membershipid
FROM
(SELECT
p.firstname,p.surname,p.dob, count(*) as cnt
FROM
person p
GROUP BY
p.firstname,p.surname,p.dob
HAVING COUNT(1) > 1) as q
INNER JOIN person p1 ON q.firstname=p1.firstname AND q.surname=p1.surname AND q.dob=p1.dob
INNER JOIN person_membership pm ON p1.personid=pm.personid
INNER JOIN membership m ON pm.memberid = m.membershipid
WHERE pm.personstatus = 'A' AND m.memstatus = 'A'

Since you are using SQL Server windows function will be handy for this scenario. The following will give you the expected output.
SELECT firstname,surname,dob,personid,memberid
from(
SELECT firstname,surname,dob,p.personid,memberid
,Rank() over(partition by p.firstname,p.surname,p.dob order by p.personid) rnasc
,Rank() over(partition by p.firstname,p.surname,p.dob order by p.personid desc) rndesc
FROM [StagingGRG].[dbo].[person] p
INNER JOIN person_membership pm ON p.personid=pm.personid
INNER JOIN membership m ON pm.memberid = m.membershipid
where personstatus='A' and memstatus='A')a
where a.rnasc+rndesc>2

You have to add Group by and Having clause to return duplicate items only-
SELECT
person.firstname,person.surname,person.dob
FROM
person, person_membership, membership
WHERE
person.personid=person_membership.personid AND person_membership.memberid = membership.membershipid
AND
person_membership.personstatus = 'A' AND membership.memstatus = 'A'
GROUP BY
person.firstname,person.surname,person.dob
HAVING COUNT(1) > 1

SQL select items between LAG and LEAD using as range

Is it possible to select and sum items from a table using Lag and lead from another table as range as below.
SELECT #Last = MAX(ID) from [dbo].[#Temp]
select opl.Name as [Age Categories] ,
( SELECT count([dbo].udfCalculateAge([BirthDate],GETDATE()))
FROM [dbo].[tblEmployeeDetail] ed
inner join [dbo].[tblEmployee] e
on ed.EmployeeID = e.ID
where convert(int,[dbo].udfCalculateAge(e.[BirthDate],GETDATE()))
between LAG(opl.Name) OVER (ORDER BY opl.id)
and (CASE opl.ID WHEN #Last THEN '100' ELSE opl.Name End )
) as Total
FROM [dbo].[#Temp] opl
tblEmployee contains the employees and their dates of birth
INSERT INTO #tblEmployees VALUES
(1, 'A', 'A1', 'A', '1983/01/02'),
(2, 'B', 'B1', 'BC', '1982/01/02'),
(3, 'C', 'C1', 'JR2', '1982/10/11'),
(4, 'V', 'V1', 'G', '1990/07/12'),
(5, 'VV', 'VV1', 'J', '1992/06/02'),
(6, 'R', 'A', 'D', '1982/05/15'),
(7, 'C', 'Ma', 'C', '1984/09/29')
Next table is a temp table which is created depending on the ages enter by user eg "20;30;50;60" generates a temp table below , using funtion split
select * FROM [dbo].[Split](';','20;30;50;60')
Temp Table
pn s
1 20
2 30
3 50
4 60
Desired output as below, though column Age Categories can be renamed in a data-table in C#. l need the total columns to be accurate on ranges.
Age Categories Total
up to 20 0
21 - 30 2
31 - 50 5
51 - 60 0

Something along these lines should work for you:
declare #tblEmployees table(
ID int,
FirstNames varchar(20),
Surname varchar(20),
Initial varchar(3),
BirthDate date)
INSERT INTO #tblEmployees VALUES
(1, 'A', 'A1', 'A', '1983/01/02'),
(2, 'B', 'B1', 'BC', '1982/01/02'),
(3, 'C', 'C1', 'JR2', '1982/10/11'),
(4, 'V', 'V1', 'G', '1990/07/12'),
(5, 'VV', 'VV1', 'J', '1992/06/02'),
(6, 'R', 'A', 'D', '1982/05/15'),
(7, 'C', 'Ma', 'C', '1984/09/29')
declare #temp table
(id int identity,
age int)
INSERT INTO #temp
SELECT cast(item as int) FROM dbo.fnSplit(';','20;30;50;60')
declare #today date = GetDate()
declare #minBirthCutOff date = (SELECT DATEADD(yy, -MAX(age), #today) FROM #temp)
declare #minBirth date = (SELECT Min(birthdate) from #tblEmployees)
IF #minBirth < #minBirthCutOff
BEGIN
INSERT INTO #temp VALUES (100)
end
SELECT COALESCE(CAST((LAG(t.age) OVER(ORDER BY t.age) + 1) as varchar(3))
+ ' - ','Up to ')
+ CAST(t.age AS varchar(3)) AS [Age Categories],
COUNT(e.id) AS [Total] FROM #temp t
LEFT JOIN
(SELECT te.id,
te.age,
(SELECT MIN(age) FROM #temp t WHERE t.age > te.age) AS agebucket
FROM (select id,
dbo.udfCalculateAge(birthdate,#today) age from #tblEmployees) te) e
ON e.agebucket = t.age
GROUP BY t.age ORDER BY t.age
Result set looks like this:
Age Categories Total
Up to 20 0
21 - 30 2
31 - 50 5
51 - 60 0
For future reference, particularly when asking SQL questions, you will get far faster and better response, if you provide much of the work that I have done. Ie create statements for the tables concerned and insert statements to supply the sample data. It is much easier for you to do this than for us (we have to copy and paste and then re-format etc), whereas you should be able to do the same via a few choice SELECT statements!
Note also that I handled the case when a birthdate falls outside the given range rather differently. It is a bit more efficient to do a single check once via MAX than to complicate your SELECT statement. It also makes it much more readable.
Thanks to HABO for suggestion on GetDate()

Need Help on Sql Query - how to get the recently deleted records

Please help me I am new to SQL
Could you please let me know how to get the recently deleted records from the below Table. Here I need to make a query only on rows which has more than 1 rows for the same Number column
I need Result like this :-
I need Result like this :-
ID FName LName Number CreateDate
2 BBBBB B 111111 06-26-2016 01:18:000
3 CCCCC C 333333 06-25-2016 06:10:000
4 DDDDD D 444444 06-25-2016 06:10:000
5 EEEEE E 555555 06-25-2016 23:10:000
7 FFFFF F 777777 06-26-2016 00:01:000
8 GGGGG G 888888 06-26-2016 16:01:000
9 HHHHH H 999999 06-26-2016 23:01:000
Create Table Users1
(
ID int,
FName varchar (50),
LName Varchar (50),
Number varchar(10),
CreateDate Datetime
)
INSERT INTO Users1 Values (1,'AAAA','A','11111','06-25-2016 00:10:765')
INSERT INTO Users1 Values (2,'AAAA','A','11111','06-26-2016 01:18:000')
INSERT INTO Users1 Values (3,'CCCC','C','33333','06-25-2016 06:10:000')
INSERT INTO Users1 Values (4,'DDDD','D','44444','06-25-2016 06:10:000')
INSERT INTO Users1 Values (5,'EEEE','E','55555','06-25-2016 23:10:000')
INSERT INTO Users1 Values (6,'CCCC','C','33333','06-25-2016 00:01:000')
INSERT INTO Users1 Values (7,'FFFF','F','77777','06-26-2016 00:01:000')
INSERT INTO Users1 Values (8,'GGGG','G','88888','06-26-2016 16:01:000')
INSERT INTO Users1 Values (9,'HHHH','H','99999','06-26-2016 23:01:000')

With Users1 As (
SELECT * FROM (
VALUES
(1, 'AAAA', 'A', '11111', '06-25-2016 00:10:765'),
(2, 'AAAA', 'A', '11111', '06-26-2016 01:18:000'),
(3, 'CCCC', 'C', '33333', '06-25-2016 06:10:000'),
(4, 'DDDD', 'D', '44444', '06-25-2016 06:10:000'),
(5, 'EEEE', 'E', '55555', '06-25-2016 23:10:000'),
(6, 'CCCC', 'C', '33333', '06-25-2016 00:01:000'),
(7, 'FFFF', 'F', '77777', '06-26-2016 00:01:000'),
(8, 'GGGG', 'G', '88888', '06-26-2016 16:01:000'),
(9, 'HHHH', 'H', '99999', '06-26-2016 23:01:000')
) V (ID, FName, LName, Number, CreateDate)
), Users1WithVersionNumber As (
Select *
, row_number() Over (Partition By Number Order By CreateDate DESC) As VersionNumber
From Users1
)
Select FName, LName, Number, CreateDate
From Users1WithVersionNumber
Where VersionNumber = 1 --< Take the latest version only
AND Number In (
Select Number
From Users1WithVersionNumber
Where VersionNumber = 2 --< There is at least one other version
)

SELECT CreateDate,FName,LName,Number FROM Users1
WHERE CreateDate in ( SELECT MAX(CreateDate) from Users1 group by Number)

How to group rows by their DATEDIFF?

I hope you can help me.
I need to display the records in HH_Solution_Audit table -- if 2 or more staffs enter the room within 10 minutes. Here are the requirements:
Display only the events that have a timestamp (LAST_UPDATED) interval of less than or equal to 10 minutes. Therefore, I must compare the current row to the next row and previous row to check if their DATEDIFF is less than or equal to 10 minutes. I’m done with this part.
Show only the records if the number of distinct STAFF_GUID inside the room for less than or equal to 10 minutes is at least 2.
HH_Solution_Audit Table Details:
ID - PK
STAFF_GUID - staff id
LAST_UPDATED - datetime when a staff enters a room
Here's what I got so far. This satisfies requirement # 1 only.
CREATE TABLE HH_Solution_Audit (
ID INT PRIMARY KEY,
STAFF_GUID NVARCHAR(1),
LAST_UPDATED DATETIME
)
GO
INSERT INTO HH_Solution_Audit VALUES (1, 'b', '2013-04-25 9:01')
INSERT INTO HH_Solution_Audit VALUES (2, 'b', '2013-04-25 9:04')
INSERT INTO HH_Solution_Audit VALUES (3, 'b', '2013-04-25 9:13')
INSERT INTO HH_Solution_Audit VALUES (4, 'a', '2013-04-25 10:15')
INSERT INTO HH_Solution_Audit VALUES (5, 'a', '2013-04-25 10:30')
INSERT INTO HH_Solution_Audit VALUES (6, 'a', '2013-04-25 10:33')
INSERT INTO HH_Solution_Audit VALUES (7, 'a', '2013-04-25 10:41')
INSERT INTO HH_Solution_Audit VALUES (8, 'a', '2013-04-25 11:02')
INSERT INTO HH_Solution_Audit VALUES (9, 'a', '2013-04-25 11:30')
INSERT INTO HH_Solution_Audit VALUES (10, 'a', '2013-04-25 11:45')
INSERT INTO HH_Solution_Audit VALUES (11, 'a', '2013-04-25 11:46')
INSERT INTO HH_Solution_Audit VALUES (12, 'a', '2013-04-25 11:51')
INSERT INTO HH_Solution_Audit VALUES (13, 'a', '2013-04-25 12:24')
INSERT INTO HH_Solution_Audit VALUES (14, 'b', '2013-04-25 12:27')
INSERT INTO HH_Solution_Audit VALUES (15, 'b', '2013-04-25 13:35')
DECLARE #numOfPeople INT = 2,
--minimum number of people that must be inside
--the room for #lengthOfStay minutes
#lengthOfStay INT = 10,
--number of minutes of stay
#dateFrom DATETIME = '04/25/2013 00:00',
#dateTo DATETIME = '04/25/2013 23:59';
WITH cteSource AS
(
SELECT ID, STAFF_GUID, LAST_UPDATED,
ROW_NUMBER() OVER (ORDER BY LAST_UPDATED) AS row_num
FROM HH_SOLUTION_AUDIT
WHERE LAST_UPDATED >= #dateFrom AND LAST_UPDATED <= #dateTo
)
SELECT [current].ID, [current].STAFF_GUID, [current].LAST_UPDATED
FROM
cteSource AS [current]
LEFT OUTER JOIN
cteSource AS [previous] ON [current].row_num = [previous].row_num + 1
LEFT OUTER JOIN
cteSource AS [next] ON [current].row_num = [next].row_num - 1
WHERE
DATEDIFF(MINUTE, [previous].LAST_UPDATED, [current].LAST_UPDATED)
<= #lengthOfStay
OR
DATEDIFF(MINUTE, [current].LAST_UPDATED, [next].LAST_UPDATED)
<= #lengthOfStay
ORDER BY [current].ID, [current].LAST_UPDATED
Running the query returns IDs:
1, 2, 3, 5, 6, 7, 10, 11, 12, 13, 14
That satisfies requirement # 1 of having less than or equal to 10 minutes interval between the previous row, current row and next row.
Can you help me with the 2nd requirement? If it's applied, the returned IDs should only be:
13, 14

Here's an idea. You don't need ROW_NUMBER and previous and next records. You just need to queries unioned - one looking for everyone that have someone checked X minutes behind, and another looking for X minutes upfront. Each uses a correlated sub-query and COUNT(*) to find number of matching people. If number is greater then your #numOfPeople - that's it.
EDIT: new version: Instead of doing two queries with 10 minutes upfront and behind, we'll only check for 10 minutes behind - selecting those that match in cteLastOnes. After that will go in another part of query to search for those that actually exist within those 10 minutes. Ultimately again making union of them and the 'last ones'
WITH cteSource AS
(
SELECT ID, STAFF_GUID, LAST_UPDATED
FROM HH_SOLUTION_AUDIT
WHERE LAST_UPDATED >= #dateFrom AND LAST_UPDATED <= #dateTo
)
,cteLastOnes AS
(
SELECT * FROM cteSource c1
WHERE #numOfPeople -1 <= (SELECT COUNT(DISTINCT STAFF_GUID)
FROM cteSource c2
WHERE DATEADD(MI,#lengthOfStay,c2.LAST_UPDATED) > c1.LAST_UPDATED
AND C2.LAST_UPDATED <= C1.LAST_UPDATED
AND c1.STAFF_GUID <> c2.STAFF_GUID)
)
SELECT * FROM cteLastOnes
UNION
SELECT * FROM cteSource s
WHERE EXISTS (SELECT * FROM cteLastOnes l
WHERE DATEADD(MI,#lengthOfStay,s.LAST_UPDATED) > l.LAST_UPDATED
AND s.LAST_UPDATED <= l.LAST_UPDATED
AND s.STAFF_GUID <> l.STAFF_GUID)
SQLFiddle DEMO - new version
SQLFiddle DEMO - old version

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

LAG function not returning desired results - sql

Thanks to dougp advice. here is the solution SELECT EMPID,FromSt,ToSt,History FROM (SELECT EMPID, ISNULL(LAG(Status) OVER(PARTITION BY Empid ORDER BY EmpID,EntryDate ),'N')As FromSt, Status AS ToSt, History FROM [dbo].[testtable] --WHERE History='CUR' ) InnerQuery WHERE FromSt<>ToSt

Related

How to write a running total based on criteria in T-SQL

Select duplicate persons with duplicate memberships

SQL select items between LAG and LEAD using as range

Need Help on Sql Query - how to get the recently deleted records

How to group rows by their DATEDIFF?

Categories

Resources