Find duplicates within a specific period - sql

I have a table with the following structure
ID Person LOG_TIME
-----------------------------------
1 1 2012-05-21 13:03:11.550
2 1 2012-05-22 13:09:37.050 <--- this is duplicate
3 1 2012-05-28 13:09:37.183
4 2 2012-05-20 15:09:37.230
5 2 2012-05-22 13:03:11.990 <--- this is duplicate
6 2 2012-05-24 04:04:13.222 <--- this is duplicate
7 2 2012-05-29 11:09:37.240
I have some application job that fills this table with data.
There is a business rule that each person should have only 1 record in every 7 days.
From the above example, records # 2,5 and 6 are considered duplicates while 1,3,4 and 7 are OK.
I want to have a SQL query that checks if there are records for the same person in less than 7 days.

;WITH cte AS
(
SELECT ID, Person, LOG_TIME,
DATEDIFF(d, MIN(LOG_TIME) OVER (PARTITION BY Person), LOG_TIME) AS diff_date
FROM dbo.Log_time
)
SELECT *
FROM cte
WHERE diff_date BETWEEN 1 AND 6
Demo on SQLFiddle

Please see my attempt on SQLFiddle here.
You can use a join based on DATEDIFF() to find records which are logged less than 7 days apart:
WITH TooClose
AS
(
SELECT
a.ID AS BeforeID,
b.ID AS AfterID
FROM
Log a
INNER JOIN Log b ON a.Person = b.Person
AND a.LOG_TIME < b.LOG_TIME
AND DATEDIFF(DAY, a.LOG_TIME, b.LOG_TIME) < 7
)
However, this will include records which you don't consider "duplicates" (for instance, ID 3, because it is too close to ID 2). From what you've said, I'm inferring that a record isn't a "duplicate" if the record it is too close to is itself a "duplicate".
So to apply this rule and get the final list of duplicates:
SELECT
AfterID AS ID
FROM
TooClose
WHERE
BeforeID NOT IN (SELECT AfterID FROM TooClose)

Please take a look at this sample.
Reference: SQLFIDDLE
Query:
select person,
datediff(max(log_time),min(log_time)) as diff,
count(log_time)
from pers
group by person
;
select y.person, y.ct
from (
select person,
datediff(max(log_time),min(log_time)) as diff,
count(log_time) as ct
from pers
group by person) as y
where y.ct > 1
and y.diff <= 7
;
PERSON DIFF COUNT(LOG_TIME)
1 1 3
2 8 3
PERSON CT
1 3

declare #Count int
set #count=(
select COUNT(*)
from timeslot
where (( (TimeFrom<#Timefrom and TimeTo >#Timefrom)
or (TimeFrom<#Timeto and TimeTo >#Timeto))
or (TimeFrom=#Timefrom or TimeTo=#Timeto)))

Related

Extra column looking at where OpenDate > ClosedDate prevoius records

I have a hard struggle with this problem.
I have the following table:
TicketNumber
OpenTicketDate YYYY_MM
ClosedTicketDate YYYY_MM
1
2018-1
2020-1
2
2018-2
2021-2
3
2019-1
2020-6
4
2020-7
2021-1
I would like to create an extra column which would monitor the open tickets at the given OpenTicketDate.
So the new table would look like this:
TicketNumber
OpenTicketDate YYYY_MM
ClosedTicketDate YYYY_MM
OpenTicketsLookingBackwards
1
2018-1
2020-1
1
2
2018-2
2021-2
2
3
2019-1
2020-6
3
4
2020-7
2021-1
2
The logic behind the 4th (extra) column is that it looks at the previous records & current record where the ClosedTicketsDate > OpenTicketDate.
For example ticketNumber 4 has '2' open tickets because there are only 2 ClosedTicketDate records where ClosedTicketDate > OpenTicketDate.
The new column only fills data based on looking at prevoius records. It is backward looking not forward.
Is there anyone who can help me out?
You could perform a self join and aggregate as the following:
Select T.TicketNumber, T.OpenTicketDate, T.ClosedTicketDate,
Count(*) as OpenTicketsLookingBackwards
From table_name T Left Join table_name D
On Cast(concat(T.OpenTicketDate,'-1') as Date) < Cast(concat(D.ClosedTicketDate,'-1') as Date)
And T.ticketnumber >= D.ticketnumber
Group By T.TicketNumber, T.OpenTicketDate, T.ClosedTicketDate
Order By T.TicketNumber
You may also try with a scalar subquery as the following:
Select T.TicketNumber, T.OpenTicketDate, T.ClosedTicketDate,
(
Select Count(*) From table_name D
Where Cast(concat(T.OpenTicketDate,'-1') as Date) <
Cast(concat(D.ClosedTicketDate,'-1') as Date)
And T.ticketnumber >= D.ticketnumber
) As OpenTicketsLookingBackwards
From table_name T
Order By T.TicketNumber
Mostly, joins tend to outperform subqueries.
See a demo.

Postgresql How to Calculate between 2 date depends on another table

let's say i have two table like this :
workday_emp
emp_id work_start work_end
1 "2021-04-06" "2021-04-14"
2 "2021-04-27" "2021-05-04"
3 "2021-04-30" "2021-05-07"
holiday_tbl
id name date
1 "holiday 1" "2021-04-07"
2 "holiday 2" "2021-04-28"
3 "holiday 3" "2021-04-29"
i want to show table like this with a query:
emp_id work_start work_end day_holiday
1 "2021-04-06" "2021-04-14" 1
2 "2021-04-27" "2021-05-04" 2
3 "2021-04-30" "2021-05-07" 1
the question is, how to calculate how many "day_holiday" between "work_start" and "work_end" depends to "holiday_tbl" table?
Please try this. For Employee 3 holiday count will 0 not 1 because his work_day starts at april30 but last holiday was apr29.
-- PostgreSQL(v11)
SELECT w.emp_id, w.work_start, w.work_end
, (SELECT COUNT(id)
FROM holiday_tbl
WHERE holiday_date BETWEEN w.work_start AND w.work_end) day_holiday
FROM workday_emp w
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=1948691b58ba841b2765d7de383f8df8
This should do the job:
SELECT emp_id, work_start, work_end, COUNT(ht.holiday) holiday_cnt
FROM workday_emp we LEFT JOIN
(
SELECT date holiday
FROM holiday_tbl
) ht ON ht.holiday BETWEEN we.work_start AND we.work_end
GROUP BY 1, 2, 3
ORDER BY 1, 2;
db<>fiddle

SQL To delete number of items is less than required item number

I have two tables - StepModels (support plan) and FeedbackStepModels (feedback), StepModels keeps how many steps each support plan requires.
SELECT [SupportPlanID],COUNT(*)AS Steps
FROM [StepModels]
GROUP BY SupportPlanID
SupportPlanID (Steps)
-------------------------------
1 4
2 9
3 3
4 10
FeedbackStepModels keeps how many steps employee entered the system
SELECT [FeedbackID],SupportPlanID,Count(*)AS StepsNumber
FROM [FeedbackStepModels]
GROUP BY FeedbackID,SupportPlanID
FeedbackID SupportPlanID
---------------------------------------------
1 1 3 --> this suppose to be 4
2 2 9 --> Correct
3 3 0 --> this suppose to be 3
4 4 10 --> Correct
If submitted Feedback steps total is less then required total amount I want to delete this wrong entry from the database. Basically i need to delete FeedbackID 1 and 3.
I can load the data into List and compare and delete it, but want to know if we can we do this in SQL rather than C# code.
You can use the query below to remove your unwanted data by SQL Script
DELETE f
FROM FeedbackStepModels f
INNER JOIN (
SELECT [FeedbackID],SupportPlanID, Count(*) AS StepsNumber
FROM [FeedbackStepModels]
GROUP BY FeedbackID,SupportPlanID
) f_derived on f_derived_FeedbackID=f.FeedBackID and f_derived.SupportPlanID = f.SupportPlanID
INNER JOIN (
SELECT [SupportPlanID],COUNT(*)AS Steps
FROM [StepModels]
GROUP BY SupportPlanID
) s_derived on s_derived.SupportPlanID = f.SupportPlanID
WHERE f_derived.StepsNumber < s_derived.Steps
I think you want something like this.
DELETE FROM [FeedbackStepModels]
WHERE FeedbackID IN
(
SELECT a.FeedbackID
FROM
(
SELECT [FeedbackID],
SupportPlanID,
COUNT(*) AS StepsNumber
FROM [FeedbackStepModels]
GROUP BY FeedbackID,
SupportPlanID
) AS a
INNER JOIN
(
SELECT [SupportPlanID],
COUNT(*) AS Steps
FROM [StepModels]
GROUP BY SupportPlanID
) AS b ON a.SupportPlanID = b.[SupportPlanID]
WHERE a.StepsNumber < b.Steps
);

SQL with nested condition

EDIT: added third requirement after playing with solution from Tim Biegeleisen
EDIT2: modified Robbie's DOB to be before his parent's marriage date
I am trying to create a query that will look at two tables and determine the difference in dates based on a percentage. I know, super confusing... Let me try and explain using the tables below:
Bob and Mary are married on 2010-01-01 and expect 4 kids (Parent table)
I want to know how many years it took until they met 50% of their expected kids (i.e. 2/4 kids). Using the Child table to see the DOB of their 4 kids, we know that Frankie is the second child which meets our 50% threshold so we use Frankie's DOB and subtract it from Frankie's parent's marriage date and end up with 3 years!
If the goal isn't reached then display no value e.g. Mick and Jo only had 1 child so far so they haven't yet reached their goal
Hoping this is doable using BigQuery standard SQL.
Parent table
id married_couple married_at expected_kids
--------------------------------------
1 Bob and Mary 2010-01-01 4
2 Mick and Jo 2010-01-01 4
Child table
id child_name parent_id date_of_birth
--------------------------------------
1 Eddie 1 2012-01-01
2 Frankie 1 2013-01-01
3 Robbie 1 2005-01-01
4 Duncan 1 2015-01-01
5 Rick 2 2014-01-01
Expected SQL result
parent_id half_goal_reached(years)
--------------------------------------
1 3
2
Below both soluthions for BigQuery Standard SQL
First one is more in classic sql way, the second one is more of BigQuery style (I think)
First Solution: with analytics function
#standardSQL
SELECT
parent_id,
IF(
MAX(pos) = MAX(CAST(expected_kids / 2 AS INT64)),
MAX(DATE_DIFF(date_of_birth, married_at, YEAR)),
NULL
) AS half_goal_reached
FROM (
SELECT c.parent_id, c.date_of_birth, expected_kids, married_at,
ROW_NUMBER() OVER(PARTITION BY c.parent_id ORDER BY c.date_of_birth) AS pos
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
)
WHERE pos <= CAST(expected_kids / 2 AS INT64)
GROUP BY parent_id
Second Solution: with use of ARRAY
#standardSQL
SELECT
parent_id,
DATE_DIFF(dates[SAFE_ORDINAL(CAST(expected_kids / 2 AS INT64))], married_at, YEAR) AS half_goal_reached
FROM (
SELECT
parent_id,
ARRAY_AGG(date_of_birth ORDER BY date_of_birth) AS dates,
MAX(expected_kids) AS expected_kids,
MAX(married_at) AS married_at
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
GROUP BY parent_id
)
Dummy Data
You can test / play with both solutions using below dummy data
#standardSQL
WITH `parent` AS (
SELECT 1 id, 'Bob and Mary' married_couple, DATE '2010-01-01' married_at, 4 expected_kids UNION ALL
SELECT 2, 'Mick and Jo', DATE '2010-01-01', 4
),
`child` AS (
SELECT 1 id, 'Eddie' child_name, 1 parent_id, DATE '2012-01-01' date_of_birth UNION ALL
SELECT 2, 'Frankie', 1, DATE '2013-01-01' UNION ALL
SELECT 3, 'Robbie', 1, DATE '2014-01-01' UNION ALL
SELECT 4, 'Duncan', 1, DATE '2015-01-01' UNION ALL
SELECT 5, 'Rick', 2, DATE '2014-01-01'
)
Try the following query, whose logic is too verbose to explain it well. I join the parent and child tables, bringing into line the parent id, number of years elapsed since marriage, running number of children, and expected number of children. With this information in hand, we can easily find the first row whose running number of children matches or exceeds half of the expected number.
SELECT parent_id, num_years AS half_goal_reached
FROM
(
SELECT parent_id, num_years, cnt, expected_kids,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY num_years) rn
FROM
(
SELECT
t2.parent_id,
YEAR(t2.date_of_birth) - YEAR(t1.married_at) AS num_years,
(SELECT COUNT(*) FROM child c
WHERE c.parent_id = t2.parent_id AND
c.date_of_birth <= t2.date_of_birth) AS cnt,
t1.expected_kids
FROM parent t1
INNER JOIN child t2
ON t1.id = t2.parent_id
) t
WHERE
cnt >= expected_kids / 2
) t
WHERE t.rn = 1;
Note that there may be issues with how I computed the yearly differences, or how I compute the threshhold for half the number of expected children. Also, if we were using a recent enterprise database we could have used an analytic function to get the running number of children instead of a correlated subquery, but I was unsure if Big Query would support that, so I used the latter.

SQL Server : max date and inner join

I have two tables, one is a list of tasks. The other containing historical values for those tasks.
I need to generate a list of the latest event (and its description) for each check, as long as long as its Date_Executed is less than the current datetime minus the Timeframe (TimeFrame being hours within the task has to be done, formatted for use in DATEADD). But only if they have an active = 1.
Table: checks
Check_id description TimeFrame active
1 Task One -24 0
2 Task Two -24 0
3 Task Forty -48 1
4 Task Somehin -128 1
Table: events
Event_id Check_id Comment Date_Executed User_Executed
1 1 NULL 2012-09-18 16:10:44.917 admin
2 1 NULL 2012-09-25 11:39:01.000 jeff
3 4 Failed 2012-09-25 13:20:09.930 steve
4 4 Half failed 2012-09-25 13:05:09.953 marsha
5 2 NULL 2012-09-25 14:02:24.000 marsha
6 3 NULL 2012-09-18 16:10:55.023 marsha
The best solutions I have so far is:
SELECT
a.[Date_Executed]
a.[Check_id],
a.[Comments],
b.[frequency],
b.[Check_id],
b.[description]
FROM
[checksdb].[dbo].events as a,
[checksdb].[dbo].checks as b
where
b.active = 1
and a.[Date_Executed] < = dateadd(HOUR,b.[frequency],GETDATE())
and a.Check_id = b.Check_id
order by Check_id, priority
and
select MAX(date_Executed), Task_id from daily_check_events group by Task_id
Neither of which gets me what I need, I could really use some help.
Since you are SQL Server which supports Common Table Expression and Window Function. Try this,
WITH latestEvents
AS
(
SELECT Event_id, Check_id, [Comment], Date_Executed, User_Executed,
ROW_NUMBER() OVER(PARTITION BY Check_ID ORDER BY DATE_Executed DESC)
AS RowNum
FROM events
)
SELECT a.[Check_id], a.[description],
b.[Date_Executed], b.[Comment]
FROM checks a
INNER JOIN latestEvents b
on a.check_ID = b.check_ID
WHERE b.RowNum = 1 AND
a.active = 1
-- other conditions here
SQLFiddle Demo
The above query will only work on RDBMS that supports Window Functions. Alternatively, use the query below that works on most RDBMS
SELECT a.Check_id, a.description,
c.Date_Executed, c.Comment
FROM checks a
INNER JOIN
(
SELECT check_id, MAX(Date_Executed) maxExecuted
FROM events
GROUP BY check_ID
) b ON a.check_ID = b.check_ID
INNER JOIN events c
ON c.check_ID = b.check_ID AND
c.date_executed = b.maxExecuted
WHERE a.active = 1
SQLFiddle Demo