MSSQL - Delete duplicate rows using common column values

MSSQL - Delete duplicate rows using common column values - sql

I haven't used SQL in quite a while, so I'm a bit lost here. I wanted to check for rows with duplicate values in the "Duration" and "date" columns to remove them from the query results. I would need to keep the rows where column = "Transfer" since these hold more information about the call and how it was routed through our system.
I want to use this for a dashboard, which would include counting the total number of calls from that query, which is why I cannot have both.
Here's the (Simplified) code used:
SELECT status, user, duration, phonenumber, date
FROM (SELECT * FROM view_InboundPhoneCalls) as Phonecalls
WHERE date>=DATEADD(dd, -15, getdate())
--GROUP BY duration
Which gives something of the sort:
Status
User
Duration
phonenumber 
date
Received
Receptionnist
00:34:03
 from: +1234567890 
2021-09-30 16:01:57 
Received
Receptionnist
00:03:12
 from: +9876543210 
2021-09-30 16:02:40 
Transfer
User1
00:05:12
 +14161654965;Receptionnist;User1 
2021-09-30 16:01:57 
Received
Receptionnist
00:05:12
 from: +14161654965 
2021-09-30 16:01:57 
The end result would be something like this:
Status
User
Duration
phonenumber 
date
Received
Receptionnist
00:34:03
 from: +1234567890 
2021-09-30 16:01:57 
Received
Receptionnist
00:03:12
 from: +9876543210 
2021-09-30 16:02:40 
Transfer
Receptionnist
00:05:12
 +14161654965;Receptionnist;User1 
2021-09-30 16:01:57

The normal "trick" is to detect duplicates first. One of the easier ways is a CTE (Common Table Expression) along with the ROW_NUMBER() function.
Part One - Mark the duplicates
WITH
cte_Sorted_List
(
status, usertype, duration, phonenumber, dated, duplicate_check
)
AS
( -- only use required fields to speed up
SELECT status, user, duration, phonenumber, date,
-- marks depend on correct columns!
Row_Number() OVER
( -- sort over relevant columns to show
PARTITION BY user, phonenumber, date, duration
-- with correct sort order
-- bit of hack: As T comes after R
-- logic: mark records to show as row number 1 in duplicate list
ORDER BY status DESC
) AS duplicate_check
FROM view_InboundPhoneCalls
-- and lose all unnecessary data
WHERE date>=DATEADD(dd, -15, getdate())
)
Part two - show relevant rows
SELECT
status, usertype, duration, phonenumber, dated
FROM
cte_Sorted_List
WHERE
Duplicate_Check = 1
;
First CTE extracts required fields in single pass, then that data only is used for output.

You could go for a blacklist, say with a CTE, then filter out the undesired rows.
Something like:
WITH Blacklist ([date], [duration]) AS (
SELECT [date], [duration] FROM view_InboundPhoneCalls
GROUP BY [date], [duration]
Having count(*) > 1
)
SELECT status, user, duration, phonenumber, date
FROM
(SELECT * FROM view_InboundPhoneCalls) as Phonecalls
LEFT JOIN
Blacklist
ON Phonecalls.[date] = Blacklist.[date]
AND Phonecalls.[duration] = Blacklist.[duration]
Where
Blacklist.[date] is null
Or
(Blacklist.[date] is not null AND Phonecalls.[Status] == 'Transfer')

You can use row-numbering for this, along with a custom ordering. There is no need for any joins.
SELECT status, [user], duration, phonenumber, date
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY duration, date
ORDER BY CASE WHEN Status = 'Transfer' THEN 1 ELSE 2 END)
FROM view_InboundPhoneCalls
WHERE date >= DATEADD(day, -15, getdate())
) as Phonecalls
WHERE rn = 1

Related

Collapse multiple rows into a single row based upon a break condition

I have a simple sounding requirement that has had me stumped for a day or so now, so its time to seek help from the experts.
My requirement is to simply roll-up multiple rows into a single row based upon a break condition - when any of these columns change Employee ID, Allowance Plan, Allowance Amount or To Date, then the row is to be kept, if that makes sense.
An example source data set is shown below:
and the target data after collapsing the rows should look like this:
As you can see I don't need any type of running totals calculating I just need to collapse the rows into a single record per from date/to date combination.
So far I have tried the following SQL using a GROUP BY and MIN function
select [Employee ID], [Allowance Plan],
min([From Date]), max([To Date]), [Allowance Amount]
from [dbo].[#AllowInfo]
group by [Employee ID], [Allowance Plan], [Allowance Amount]
but that just gives me a single row and does not take into account the break condition.
what do I need to do so that the records are rolled-up (correct me if that is not the right terminology) correctly taking into account the break condition?
Any help is appreciated.
Thank you.

Note that your test data does not really exercise the algo that well - e.g. you only have one employee, one plan. Also, as you described it, you would end up with 4 rows as there is a change of todate between 7->8, 8->9, 9->10 and 10->11.
But I can see what you are trying to do, so this should at least get you on the right track, and returns the expected 3 rows. I have taken the end of a group to be where either employee/plan/amount has changed, or where todate is not null (or where we reach the end of the data)
CREATE TABLE #data
(
RowID INT,
EmployeeID INT,
AllowancePlan VARCHAR(30),
FromDate DATE,
ToDate DATE,
AllowanceAmount DECIMAL(12,2)
);
INSERT INTO #data(RowID, EmployeeID, AllowancePlan, FromDate, ToDate, AllowanceAmount)
VALUES
(1,200690,'CarAllowance','30/03/2017', NULL, 1000.0),
(2,200690,'CarAllowance','01/08/2017', NULL, 1000.0),
(6,200690,'CarAllowance','23/04/2018', NULL, 1000.0),
(7,200690,'CarAllowance','30/03/2018', NULL, 1000.0),
(8,200690,'CarAllowance','21/06/2018', '01/04/2019', 1000.0),
(9,200690,'CarAllowance','04/11/2021', NULL, 1000.0),
(10,200690,'CarAllowance','30/03/2017', '13/05/2022', 1000.0),
(11,200690,'CarAllowance','14/05/2022', NULL, 850.0);
-- find where the break points are
WITH chg AS
(
SELECT *,
CASE WHEN LAG(EmployeeID, 1, -1) OVER(ORDER BY RowID) != EmployeeID
OR LAG(AllowancePlan, 1, 'X') OVER(ORDER BY RowID) != AllowancePlan
OR LAG(AllowanceAmount, 1, -1) OVER(ORDER BY RowID) != AllowanceAmount
OR LAG(ToDate, 1) OVER(ORDER BY RowID) IS NOT NULL
THEN 1 ELSE 0 END AS NewGroup
FROM #data
),
-- count the number of break points as we go to group the related rows
grp AS
(
SELECT chg.*,
ISNULL(
SUM(NewGroup)
OVER (ORDER BY RowID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
0) AS grpNum
FROM chg
)
SELECT MIN(grp.RowID) AS RowID,
MAX(grp.EmployeeID) AS EmployeeID,
MAX(grp.AllowancePlan) AS AllowancePlan,
MIN(grp.FromDate) AS FromDate,
MAX(grp.ToDate) AS ToDate,
MAX(grp.AllowanceAmount) AS AllowanceAmount
FROM grp
GROUP BY grpNum

one way is to get all rows the last todate, and then group on that
select min(t.RowID) as RowID,
t.EmployeeID,
min(t.AllowancePlan) as AllowancePlan,
min(t.FromDate) as FromDate,
max(t.ToDate) as ToDate,
min(t.AllowanceAmount) as AllowanceAmount
from ( select t.RowID,
t.EmployeeID,
t.FromDate,
t.AllowancePlan,
t.AllowanceAmount,
case when t.ToDate is null then ( select top 1 t2.ToDate
from test t2
where t2.EmployeeID = t.EmployeeID
and t2.ToDate is not null
and t2.FromDate > t.FromDate -- t2.RowID > t.RowID
order by t2.RowID, t2.FromDate
)
else t.ToDate
end as todate
from test t
) t
group by t.EmployeeID, t.ToDate
order by t.EmployeeID, min(t.RowID)
See and test yourself in this DBFiddle
the result is
RowID
EmployeeID
AllowancePlan
FromDate
ToDate
AllowanceAmount
1
200690
CarAllowance
2017-03-30
2019-04-01
1000
9
200690
CarAllowance
2021-11-04
2022-05-13
1000
11
200690
CarAllowance
2022-05-14
(null)
850

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!

The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

Total number of employees logged in

I want to calculate the total number of employees logged in the system at a particular time and also the total work hour of each.
We have the login system which stores the data in the following data structure:
1.EmpId
2.Status
3.Created
The above data is stored in the following table:
EmpId Status Created
1 In 2019-10-23 12:00:00
1 Out 2019-10-23 12:45:45
2 In 2019-10-23 14:25:40
1 In 2019-10-23 18:45:45
2 Out 2019-10-23 20:50:40
2 In 2019-10-24 1:27:24
3 In 2019-10-24 2:45:45
The In is followed by Out and vice versa. And the employees work duration is spread across days, I mean the in and out could be across the days.
I need to implement the following:
How to calculate the number of employees logged in at a particular time say, "2019-10-23 14:12:45".
How to calculate the total work hours of all the employees since start?

You can use this query to get the employees registered on a specific time:
SELECT *
FROM (
SELECT EmpID, Created AS [In], LEAD(Created) OVER (PARTITION BY EmpID ORDER BY Created ASC) AS [Out]
FROM table_name
WHERE Status = 'In'
) t WHERE '2019-10-23 14:12:45' BETWEEN [In] AND [Out]
OR '2019-10-23 14:12:45' >= [In] AND [Out] IS NULL;
... and the following query to get the total work hours of each employee as TIME value:
SELECT EmpID, CONVERT(TIME(0), DATEADD(S, ISNULL(SUM(DATEDIFF(S, [In], [Out])), 0), 0), 108) AS work_hours
FROM (
SELECT EmpID, Created AS [In], LEAD(Created) OVER (PARTITION BY EmpID ORDER BY Created ASC) AS [Out]
FROM table_name
WHERE Status = 'In'
) t GROUP BY EmpID
demo on dbfiddle.uk
You can also define and use a CTE (common table expression) on the queries instead of the sub-select to get a flat table with login time and logout time as columns:
WITH Employees(EmpID, [In], [Out]) AS
(
SELECT EmpID, Created AS [In], LEAD(Created) OVER (PARTITION BY EmpID ORDER BY Created ASC) AS [Out]
FROM table_name
WHERE Status = 'In'
)

Assuming that ins and outs are interleaved, then you can use conditional aggregation and filtering:
select sum(case when status = 'in' then 1
when status = 'out' then -1
end) as employees_at_time
from t
where create <= '2019-10-23 14:12:45';

SQL Oracle - Find all combination of events possible based on date

I have a table with the following information:
Data Sample
**Table 1**
palletNumber-- event-- --recordDate
-----1-----------A-------01/11/2015 01:00
-----1-----------B-------01/11/2015 02:00
-----1-----------C-------01/11/2015 03:00
-----1-----------D-------01/11/2015 04:00
-----2-----------A-------01/11/2015 01:10
-----2-----------C-------01/11/2015 01:15
-----2-----------E-------01/11/2015 01:20
I want to select all the possible combinations of events that appear in the table in the sequence of the recordDate by palletNumber. I tried various statements with Row Number, Over Partition but this did not get me close to what I am looking for... Any direction on where to go?
This would be the output table for example:
**Table 2**
event1-- event2--
---A------B------
---B------C------
---C------D------
---A------C------
---C------E------
Thanks,

You can get the previous or next event using lag() or lead():
select event,
lead(event) over (partition by palletnumber order by recorddate) as next_event
from datasample;
If you want to eliminate duplicates, I would be inclined to use group by, because this also gives the ability to count the number of times that each pair appears:
select event, next_event, count(*) as cnt
from (select event,
lead(event) over (partition by palletnumber order by recorddate) as next_event
from datasample
) ds
group by event, next_event;

Use Case:
select case when palletNumber = 1 then event else null end as event1,
case when palletNumber = 2 then event else null end as event2,
recordDate
from table1
Then you can work with the data using lead/lag or sum() / group by to get it in one row.
Assuming events 1/2 only have one record per date
select recordDate, max (event1), max (event2)
from ( select case when palletNumber = 1 then event else null end as event1,
case when palletNumber = 2 then event else null end as event2,
recordDate
from table1
order by recordDate) tab2
group by recordDate

Retrieving data dependent on attributes

everyone. I can't do the following query. Please help.
Initial data and ouput are on the following excel initial data/output google/drive
Here is the logic: for 'Rest' = 2500, it takes minimum value of 'Date', increments it by one and put it into Date1 column of output; Date2 receives the minimum value of date of the next 'Rest' value (1181,85).. and so on: Date1 receives 'Rest' (1181,85) min value of 'Date'(14.01.2013) incremented by one (15.01.2013) and so on. It should not do the above operations for 'Rest' value of zero (it should just skip it). We can't initially delete rows with 'Rest' value of zero, because their Date is used in Date2, as I have explained above. There are many 'accNumber's, it should list all of them. Please help. I hope you understood, if not ask for more details. Thanks in advance. I'm using SQL server.

If I've understood you correctly, you want to group the items by rest number, and then display the minimum date + 1 day, as well as the minimum date for the "next" rest number. What are you expecting to happen when the Rest number is 0 in two different places?
with Base as
(
select t.AccNum,
t.Rest,
DATEADD(day, 1, MIN(t.Date)) as [StartDate],
ROW_NUMBER() OVER (ORDER BY MIN(t.Date)) as RowNumber
from Accounts t
where t.Rest <> 0
group by t.AccNum, t.Rest
)
select a.AccNum, a.Rest, a.StartDate, DATEADD(DAY, -1, b.StartDate) as [EndDate]
from Base a
left join Base b
on a.RowNumber = b.RowNumber - 1
order by a.[StartDate]
If there's the possibility of the Rest number being duplicated further down, but that needing to be a separate item, then we need to be a bit cleverer in our initial select query.
with Base as
(
select b.AccNum, b.Rest, MIN(DATEADD(day, 1, b.Date)) as [StartDate], ROW_NUMBER() OVER (ORDER BY MIN(Date)) as [RowNumber]
from (
select *, ROW_NUMBER() OVER (PARTITION BY Rest ORDER BY Date) - ROW_NUMBER() OVER (ORDER BY Date) as [Order]
from Accounts a
-- where a.Rest <> 0
-- If we're still filtering out Rest 0 uncomment the above line
) b
group by [order], AccNum, Rest
)
select a.RowNumber, a.AccNum, a.Rest, a.StartDate, DATEADD(DAY, -1, b.StartDate) as [EndDate]
from Base a
left join Base b
on a.RowNumber = b.RowNumber - 1
order by a.[StartDate]
Results for both queries:
Account Number REST Start Date End Date
45817840200000057948 2500 2013-01-01 2013-01-14
45817840200000057948 1181 2013-01-15 2013-01-31
45817840200000057948 2431 2013-02-01 2013-02-09
45817840200000057948 1563 2013-02-10 NULL

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

MSSQL - Delete duplicate rows using common column values - sql

Related

Collapse multiple rows into a single row based upon a break condition

How to solve a nested aggregate function in SQL?

Total number of employees logged in

SQL Oracle - Find all combination of events possible based on date

Retrieving data dependent on attributes

Categories

Resources