Selecting duplicates based on some criteria

Selecting duplicates based on some criteria - sql

I have a table which records the activation and deactivation of a module by users. I would like to select only those rows in my table that have maximum time differences between activation and deactivation.
user activation Deactivation usage dmb
Pejman 2005-10-04 14:02:00 2012-09-28 18:29:00 23 198
Pejman 2006-05-30 12:44:00 2012-09-28 18:29:00 34 198
Pejman 2009-11-18 11:06:00 2012-09-28 18:29:00 64 198
Shahin 2005-02-11 10:53:00 2012-09-28 18:29:00 52 323
Shahin 2012-06-24 08:35:00 2012-09-28 18:29:00 17 323
Shahin 2005-02-24 14:22:00 2006-01-16 09:03:00 19 323
Kazem 2008-01-21 22:32:00 2011-01-01 00:00:00 73 666
Kazem 2008-01-21 22:35:00 2012-09-28 18:29:00 62 666
my desired output
user
Pejman 2005-10-04 14:02:00 2012-09-28 18:29:00 23 198
Shahin 2005-02-11 10:53:00 2012-09-28 18:29:00 52 323
Kazem 2008-01-21 22:35:00 2012-09-28 18:29:00 62 666
to find the duration (difference between activation and deactivation) I am using the following sql code (MSSQL)
datediff(d,isnull(deactivation, getdate()), activation) AS Time_Difference
edit
My table is a result of "with" operation. Meaning to get such the first table, I have written some other queries look like the following
with mytab_cte(user, editionID, size, CompanyName, cvr, postcode, agreement, ActivationT, DeactivationT)
as
(
select ops.Opsaetning as opsaetning, A.EditionID as editionID, b.FilStr as size, v.Navn as CompanyName, v.CVR as cvr, v.Postnr as postcode, A.AftaleNr, BMA.Tilmeldt as ActivationT, BMA.Frameldes as DeactivationT
from BilagsAttachment as b
inner join Opsaetning as ops on b.Opsaetning = ops.Opsaetning
inner join Virksomhed as v on v.Aftalenr = ops.Aftalenr
inner join Aftalenr as A on A.AftaleNr = v.Aftalenr
inner join BenyttetModulAftalenr as BMA on BMA.Aftalenr = ops.Aftalenr
where ISNULL(v.SkalFaktureres, 0) = 1
AND ISNULL(v.ProeveVirksomhed, 0) = 0
AND A.Spaerret = 0
AND (A.FrameldingsDato IS NULL
OR A.FrameldingsDato > GETDATE())
AND v.TilSletning = 0
AND V.Email
)

This looks like SQL server. In which case...
select * from
(
select *,
row_number() over
(
partition by [user]
order by datediff(d,isnull(deactivation, getdate()), activation) desc
) as rn
from yourtable
) v
where rn = 1

Try this:
;WITH DiffCTE
AS
(
SELECT *,
datediff(d,isnull(deactivation, getdate()), activation)
AS Time_Difference
FROM TableName
), MaxDiffs
AS
(
SELECT t1.*
FROM DiffCTE t1
INNER JOIN
(
SELECT user, MAX(Time_Difference) MAXTime_Difference
FROM DiffCTE
GROUP BY user
) t2 ON t2.user = t2.user AND t1. Time_Difference = t2.MAXTime_Difference
)
SELECT user, activation, Deactivation, usage, dmb
FROM MaxDiffs;
Or: using ranking functions you can do that:
;WITH cte
AS
(
SELECT *, ROW_NUMBER() OVER(Partition by [user] ORDER BY datediff(d,isnull(deactivation, getdate()), activation) DESC) row_num
FROM #t
)
SELECT * FROM cte WHERE row_num = 1;
Here is a live demo

Related

Missing dates for specific identifiers without adding extra dates when this identifier is no longer in the database SQL

To put the problem in words, I have a massive table which includes subscribers and data for every day. If the subscriber no longer exists, then they will have no more records i.e. SUB123 no longer exists from the 28/10/2021 then this subscriber will have records up every day until 27/10/2021. The problem at hand is that some subscribers have missing dates and this could perhaps be as it is a weekend or other problems. I want to fill these records with null values so that they could be on record.
The current problem:
Subscriber
Date
Rev
sub123
25/10/2021
256
sub456
25/10/2021
282
sub123
26/10/2021
652
sub123
27/10/2021
396
sub456
28/10/2021
132
sub456
29/10/2021
484
sub456
01/11/2021
96
sub456
02/11/2021
45
The desired solution:
Subscriber
Date
Rev
sub123
25/10/2021
256
sub456
25/10/2021
282
sub123
26/10/2021
652
sub456
26/10/2021
NULL
sub123
27/10/2021
396
sub456
27/10/2021
NULL
sub456
28/10/2021
132
sub456
29/10/2021
484
sub456
30/10/2021
NULL
sub456
31/10/2021
NULL
sub456
01/11/2021
96
sub456
02/11/2021
45
My current attempt:
WITH all_dates as (
SELECT
CAST(date_column AS DATE) date_column, b.subscriber, b.date
FROM
(VALUES
(SEQUENCE(
min(b.date) OVER (PARTITION BY b.subscriber ORDER BY b.date),
max(b.date) OVER (PARTITION BY b.subscriber ORDER BY b.date),
INTERVAL '1' DAY)
)
) AS t1(date_array)
CROSS JOIN
UNNEST(date_array) AS t2(date_column)
LEFT JOIN MAINTABLE b
on t2.date_column = b.date
),
customer_dates as (
SELECT distinct a.subscriber, a.date, b.date_column
from MAINTABLE a
left join all_dates b
on a.date = b.date_column
)
SELECT *
from customer_dates a
This code doesn't work but its an attempt to what I am trying to accomplish if I were to use the following code that is attached below it will generate dates for all subscribers from the initial date to the end date which is not what we want hence why the above code is what was attempted.
WITH all_dates as (
SELECT
CAST(date_column AS DATE) date_column, b.subscriber, b.date
FROM
(VALUES
(SEQUENCE(
date('2021-10-25'),
date('2022-04-30'),
INTERVAL '1' DAY)
)
) AS t1(date_array)
CROSS JOIN
UNNEST(date_array) AS t2(date_column)
LEFT JOIN MAINTABLE b
on t2.date_column = b.date
),
customer_dates as (
SELECT distinct a.subscriber, a.date, b.date_column
from MAINTABLE a
left join all_dates b
on a.date = b.date_column
)
SELECT *
from customer_dates a

You can use lag function to generate missing ranges to flatten with unnest and handle Rev additionally:
-- sample data
WITH dataset (Subscriber, Date, Rev) AS (
VALUES ('sub123', date_parse('25-10-2021', '%d-%m-%Y'), 256),
('sub456', date_parse('25-10-2021', '%d-%m-%Y'), 282),
('sub123', date_parse('26-10-2021', '%d-%m-%Y'), 652),
('sub123', date_parse('27-10-2021', '%d-%m-%Y'), 396),
('sub456', date_parse('28-10-2021', '%d-%m-%Y'), 132),
('sub456', date_parse('29-10-2021', '%d-%m-%Y'), 484),
('sub456', date_parse('01-11-2021', '%d-%m-%Y'), 96),
('sub456', date_parse('02-11-2021', '%d-%m-%Y'), 45)
)
-- query
select subscriber, lifted_date as date, if(date = lifted_date, rev, NULL) rev
from
(
select Subscriber,
Rev,
cast(date as date) date,
lag(cast(date as date)) over(partition by Subscriber order by date) prev_date
from dataset
)
cross join unnest(
array_except(sequence(coalesce(prev_date, date), date, interval '1' day), array[prev_date])
) as t(lifted_date)
order by subscriber, date
Output:
subscriber
date
rev
sub123
2021-10-25 00:00:00.000
256
sub123
2021-10-26 00:00:00.000
652
sub123
2021-10-27 00:00:00.000
396
sub456
2021-10-25 00:00:00.000
282
sub456
2021-10-26 00:00:00.000
sub456
2021-10-27 00:00:00.000
sub456
2021-10-28 00:00:00.000
132
sub456
2021-10-29 00:00:00.000
484
sub456
2021-10-30 00:00:00.000
sub456
2021-10-31 00:00:00.000
sub456
2021-11-01 00:00:00.000
96
sub456
2021-11-02 00:00:00.000
45

SQL Server - SUM and comma-separated values using GROUP BY clause

I have 2 tables:
NDEvent:
EventId EndTime
33 2020-10-23 15:00:00.000
33 2020-10-23 15:00:00.000
35 2020-10-21 03:30:00.000
35 2020-10-24 15:00:00.000
35 2020-10-25 15:00:00.000
34 2020-10-23 15:00:00.000
EventAppointment:
Id DocId EventId Amount
1 7647 34 10.00
2 7647 34 10.00
3 28531 33 20.00
4 7647 35 20.00
5 7647 35 100.00
6 7647 35 200.00
And I want result to be like this:
DocId EventId Amount Id
7647 34 20.00 1,2
28531 33 20.00 3
7647 35 320.00 4,5,6
What I have tried is:
select e.Amount,e.DoctorId,e.EventId,
Id= STUFF(
(SELECT DISTINCT ',' + CAST(e.Id as nvarchar(max))
from NDEvent nd
inner join EventAppointment e on nd.Id = e.EventId
where
GETDATE() > nd.EndTime
GROUP BY
e.Amount,e.DoctorId,e.EventId,e.Id
FOR XML PATH(''))
, 1, 1, ''
)
from NDEvent nd
inner join EventAppointment e on nd.Id = e.EventId
where
GETDATE() > nd.EndTime
GROUP BY
e.Amount,e.DoctorId,e.EventId
But it is not giving expected result.
Could anyone help with this query? Or point me to a right direction? Thank you.

It doesn't look like yo need to NDEvent table here at all (though I include it in the sample data). Just SUM and STRING_AGG against EventAppointment:
USE Sandbox
GO
WITH NDEvent AS(
SELECT *
FROM (VALUES(33,CONVERT(datetime,'2020-10-23T15:00:00.000')),
(33,CONVERT(datetime,'2020-10-23T15:00:00.000')),
(35,CONVERT(datetime,'2020-10-21T03:30:00.000')),
(35,CONVERT(datetime,'2020-10-24T15:00:00.000')),
(35,CONVERT(datetime,'2020-10-25T15:00:00.000')),
(34,CONVERT(datetime,'2020-10-23T15:00:00.000')))V(EventID,EndTime)),
EventAppointment AS(
SELECT *
FROM (VALUES(1,7647 ,34,10.00),
(2,7647 ,34,10.00),
(3,28531,33,20.00),
(4,7647 ,35,20.00),
(5,7647 ,35,100.00),
(6,7647 ,35,200.00))V(Id,DocId, EventID, Amount))
SELECT DocID,
EventID,
SUM(Amount) AS Amount,
STRING_AGG(Id,',') WITHIN GROUP (ORDER BY Id) AS IDs
FROM EventAppointment EA
GROUP BY DocId,
EventID;

Can be used in other data.
WITH Table1 AS(
SELECT EventId FROM NDEvent
GROUP BY EventId
),
Table2 AS(
SELECT e.DocId,e.EventId,e.Amount,
STUFF((
SELECT ',' + CAST(ee.Id as nvarchar)
FROM EventAppointment ee
where ee.EventId = e.EventId
GROUP BY ee.EventId,ee.Id
FOR XML PATH('')), 1, 1, '') AS Id
FROM Table1 t
LEFT OUTER JOIN EventAppointment e ON t.EventId = e.EventId
)
SELECT DocId,EventId,SUM(Amount) AS Amount,Id FROM Table2
GROUP BY DocId,EventId,Id

select rows with events related with another events in the same query column

I need to select rows with EventTypeID = 19 which does not have related EventtypeID = 21 LoggedOn exactly 4 minutes earlier for the same EmployeeID. Here's the query bellow and some raw output:
SELECT * FROM
(
SELECT rcp..EventLogEntries.EmployeeID, rcp..EventLogEntries.EventTypeID, rcp..EventLogEntries.TerminalID, rcp..EventLogEntries.LoggedOn
FROM rcp..EventLogEntries
WHERE rcp..EventLogEntries.terminalid = 3
UNION
SELECT viso..AccessUserPersons.UserExternalIdentifier, rcp..EventTypes.ID, rcp..Terminals.ID, viso..EventLogEntries.LoggedOn
FROM viso..EventLogEntries, viso..AccessUserPersons, rcp..Terminals, rcp..EventTypes
WHERE viso..EventLogEntries.LocationID = 10
AND viso..EventLogEntries.EventCode = 615
AND rcp..EventTypes.Code = 36
AND viso..EventLogEntries.PersonID = viso..AccessUserPersons.ID
AND viso..EventLogEntries.locationID = rcp..Terminals.TerminalTAID
) results
ORDER BY LoggedOn
EmployeeID EventTypeID TerminalID LoggedOn
273 19 3 2018-12-04 12:31:23.000
273 21 3 2018-12-04 12:34:18.000
483 19 3 2018-12-04 12:40:10.000
268 19 3 2018-12-04 13:19:23.000
273 21 3 2018-12-04 13:28:00.000
273 19 3 2018-12-04 13:32:00.000
459 19 3 2018-12-04 15:01:04.000
What I need to achieve is:
EmployeeID EventTypeID TerminalID LoggedOn
273 19 3 2018-12-04 12:31:23.000
483 19 3 2018-12-04 12:30:10.000
268 19 3 2018-12-04 13:19:23.000
459 19 3 2018-12-04 15:01:04.000
TerminalID column value is always 3 in that scenario and it's not related with any query condition, but must be in the output for syntax requirement in the futher processing.

The bad practice to join tables using conditions in WHERE. The block WHERE need to use to filter first of all.
And aliases help to make code shorter.
SELECT * FROM
(
SELECT EmployeeID, EventTypeID, TerminalID, LoggedOn
FROM rcp..EventLogEntries
WHERE terminalid = 3
UNION
SELECT p.UserExternalIdentifier, et.ID, t.ID, el.LoggedOn
FROM viso..EventLogEntries el
JOIN viso..AccessUserPersons p ON el.PersonID = p.ID
JOIN rcp..Terminals t ON el.locationID = t.TerminalTAID
JOIN rcp..EventTypes et ON --!!! no any condition here
WHERE el.LocationID = 10
AND el.EventCode = 615
AND et.Code = 36
) results
ORDER BY LoggedOn
Try to use the following:
WITH cteData AS
(
SELECT EmployeeID, EventTypeID, TerminalID, LoggedOn
FROM rcp..EventLogEntries
WHERE terminalid = 3
UNION
SELECT p.UserExternalIdentifier, et.ID, t.ID, el.LoggedOn
FROM viso..EventLogEntries el
JOIN viso..AccessUserPersons p ON el.PersonID = p.ID
JOIN rcp..Terminals t ON el.locationID = t.TerminalTAID
JOIN rcp..EventTypes et ON --!!! no any condition here
WHERE el.LocationID = 10
AND el.EventCode = 615
AND et.Code = 36
)
SELECT q19.*
FROM
(
SELECT *
FROM cteData
WHERE EventTypeID=19
) q19
LEFT JOIN
(
SELECT *
FROM cteData
WHERE EventTypeID=21
) q21
ON q19.EmployeeID=q21.EmployeeID
WHERE (DATEDIFF(MINUTE,q19.LoggedOn,q21.LoggedOn)>4 OR q21.LoggedOn IS NULL)
If there don't need any conditions you can use CROSS JOIN.
I got your result using your data:
WITH cteData AS(
SELECT *
FROM (VALUES
(273,19,3,CAST('2018-12-04 12:31:23.000' AS datetime)),
(273,21,3,CAST('2018-12-04 12:34:18.000' AS datetime)),
(483,19,3,CAST('2018-12-04 12:40:10.000' AS datetime)),
(268,19,3,CAST('2018-12-04 13:19:23.000' AS datetime)),
(273,21,3,CAST('2018-12-04 13:28:00.000' AS datetime)),
(273,19,3,CAST('2018-12-04 13:32:00.000' AS datetime)),
(459,19,3,CAST('2018-12-04 15:01:04.000' AS datetime))
)v(EmployeeID,EventTypeID,TerminalID,LoggedOn)
)
SELECT q19.*
FROM
(
SELECT *
FROM cteData
WHERE EventTypeID=19
) q19
LEFT JOIN
(
SELECT *
FROM cteData
WHERE EventTypeID=21
) q21
ON q19.EmployeeID=q21.EmployeeID
WHERE (DATEDIFF(MINUTE,q19.LoggedOn,q21.LoggedOn)>4 OR q21.LoggedOn IS NULL)

Small change done in this part:
ON q19.EmployeeID=q21.EmployeeID AND q19.LoggedOn=dateadd(mi, 4, q21.LoggedOn)
WHERE q21.LoggedOn IS NULL

How to get data with a must different conditions?

If i have tables with the following structure :
USERS :
USERID,NAME
TIMETB :
CHECKTIME,Sysdate,modified,USERID
If i have sample data like this :
USERS :
USERID NAME
434 moh
77 john
66 yara
TIMETB :
CHECKTIME USERID modified
2015-12-21 07:20:00.000 434 0
2015-12-21 08:39:00.000 434 2
2015-12-22 07:31:00.000 434 0
2015-12-21 06:55:00.000 77 0
2015-12-21 07:39:00.000 77 0
2015-12-25 07:11:00.000 66 0
2015-12-25 07:22:00.000 66 0
2015-12-25 07:50:00.000 66 2
2015-12-26 07:40:00.000 66 2
2015-12-26 07:21:00.000 66 2
Now i want to get the users who have two or more different transactions(modified) at the same date :
The result i expect is :
CHECKTIME USERID modified NAME
2015-12-21 07:20:00.000 434 0 moh
2015-12-21 08:39:00.000 434 2 moh
2015-12-25 07:11:00.000 66 0 yara
2015-12-25 07:22:00.000 66 0 yara
2015-12-25 07:50:00.000 66 2 yara
I write the following query but i get more than i expect i mean i get users who have transactions of the same (modified) !!.
SELECT a.CHECKTIME,
a.Sysdate,
(CASE WHEN a.modified = 0 THEN 'ADD' ELSE 'DELETE' END) AS modified,
b.BADGENUMBER,
b.name,
a.Emp_num AS Creator
FROM TIMETB a
INNER JOIN Users b ON a.USERID = b.USERID
WHERE YEAR(checktime) = 2015
AND MONTH(checktime) = 12
AND (
SELECT COUNT(*)
FROM TIMETB cc
WHERE cc.USERID = a.USERID
AND CONVERT(DATE, cc.CHECKTIME) = CONVERT(DATE, a.CHECKTIME)
AND cc.modified IN (0, 2)
) >= 2
AND a.modified IS NOT NULL
AND a.Emp_num IS NOT NULL

You use window functions for this:
select t.*
from (select t.*,
count(*) over (partition by userid, cast(checktime as date)) as cnt
from timetb t
) t
where cnt >= 2;
If you want the name, just join in the appropriate table.
EDIT:
If you want different values of a column, a simple way is to compare the min and max values:
select t.*
from (select t.*,
min(modified) over (partition by userid, cast(checktime as date)) as minm,
max(modified) over (partition by userid, cast(checktime as date)) as maxm
from timetb t
) t
where minm <> maxm;

How to determine the maximum value for each category in SQL?

My table has records like below:
ID EmpID EffectiveDate PayElement Amount ComputeType AddDeduction
42 ISIPL001 2010-04-16 00:00:00.000 Basic 8000.00 On Attendance Addition
43 ISIPL001 2010-04-01 00:00:00.000 Con 2000.00 On Attendance Addition
44 ISIPL001 2010-04-01 00:00:00.000 HRA 2000.00 On Attendance Addition
54 ISIPL001 2011-01-01 00:00:00.000 Basic 15000.00 On Attendance Addition
55 ISIPL001 2011-01-01 00:00:00.000 Con 6000.00 On Attendance Addition
57 ISIPL001 2011-01-01 00:00:00.000 HRA 6000.00 On Attendance Addition
61 ISIPL001 2010-07-10 00:00:00.000 Basic 12000.00 On Attendance Addition
66 ISIPL001 2010-07-10 00:00:00.000 HRA 4200.00 On Attendance Addition
68 ISIPL001 2010-07-10 00:00:00.000 Con 5600.00 On Attendance Addition
I want the result display below:
i.e for each pay element available in my database, I need to record which is having maximum date for each pay element.
So my output should be like given below:
54 Basic 15000
55 Con 6000
57 HRA 6000

Try this:
SELECT ID,
PayElement,
Amount
FROM (
SELECT a.*,
RANK() OVER(PARTITION BY PayElement ORDER BY EffectiveDate DESC) AS rn
FROM <YOUR_TABLE> a
) a
WHERE rn = 1

;with cte as
(
select *,
row_number() over(partition by PayElement order by EffectiveDate desc) as rn
from YourTable
)
select
ID,
PayElement,
Amount
from cte
where rn = 1

Try this.
select
T.ID,
T.PayElement,
T.Amount
from
Test T inner join (select MAX(T_DATE.EffectiveDate) as MAX_DATE, T_DATE.PayElement from Test T_DATE group by T_DATE.PayElement) T_DATE on (T.PayElement = T_DATE.PayElement) and (T.EffectiveDate = T_DATE.MAX_DATE)
order by
T.ID

Select a.Id,
a.PayElement,
a.Amount
From dbo.YourTable a
Join
(
Select PayElement,
Max(EffectiveDate) as[MaxDate]
From dbo.YourTable
Group By PayElement
)b on a.PayElement = b.PayElement
And a.EffectiveDate = b.MaxDate

try something like
Select
a.ID, a.PayElement, a.Amount
From MyTable a
Inner Join (
Select PayElement, max(EffectiveDate) as MaxDate From MyTable Group By PayElement
) sub on a.EffectiveDate = sub.MaxDate and a.PayElement = sub.PayElement

select
Id, PayElement, Amount
from
YourTable a
inner join
(select
Id, PayElement, max(EffectiveDate) as EffectiveDate
from
YourTable
group by
PayElement, Id) b
on
a.Id = b.Id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting duplicates based on some criteria - sql

This looks like SQL server. In which case... select * from ( select *, row_number() over ( partition by [user] order by datediff(d,isnull(deactivation, getdate()), activation) desc ) as rn from yourtable ) v where rn = 1

Related

Missing dates for specific identifiers without adding extra dates when this identifier is no longer in the database SQL

SQL Server - SUM and comma-separated values using GROUP BY clause

select rows with events related with another events in the same query column

How to get data with a must different conditions?

How to determine the maximum value for each category in SQL?

Categories

Resources