Find duplicates in Select statement after an if check - sql

I am working on a project that keeps a track of repaired cell phones.
In the select statement, I would like to find the duplicate IMEI numbers and check if the AddedDate between the duplicates is less than 30 days. Another words, the select should list all the phones even including the duplicated IMEI numbers if the AddedDate is more than 30 days.
I hope I described it clear enough. Thank you.
Additional notes:
I have tried it by including groupBy under a sub-select which did find the duplicates, but I wasn't able to implement an if condition. Instead, I was going to place all duplicates into a dynamic table and then use a select statement against this table. Before doing so, I thought of posting my question here.
For example DB_Phones has the following rows
ID - AddedDate - IMEI
1 - 01.10.2012 - 123456789012345
2 - 15.10.2012 - 987654321012345
3 - 20.10.2012 - 123456789012345
Based on the table above, I would like to list only the second row (ID# 2) because the last duplicate (ID# 3) wasn't added 30 days after the row with the ID# 1. If rows were as below:
ID - AddedDate - IMEI
1 - 01.10.2012 - 123456789012345
2 - 15.10.2012 - 987654321012345
3 - 20.10.2012 - 123456789012345
4 - 21.11.2012 - 123456789012345
Then the second and fourth row should be returned. I need to return just one of the duplicates (last one) if the 30 day condition is met.
I hope it make more sense now. Thanks again.

A guess at what you're after:
SELECT
r.*,
(SELECT COUNT(*) FROM Repairs r2 WHERE r.IMEI = r2.IMEI
AND r.ID != r2.ID) as NumberOfAllDuplicates,
(SELECT COUNT(*) FROM Repairs r2 WHERE r.IMEI = r2.IMEI
AND ABS(DATEDIFF(day, r.AddedDate, r2.AddedDate)) < 30
AND r.ID != r2.ID) as NumberOfNearDuplicates
FROM
Repairs r
This depends on having an ID field, and everything existing in one table. With the correlated sub queries, it may not be very fast on long data.

Related

How do I stop my query from pulling duplicates?

Yes, I know this seems simple:
SELECT DISTINCT(...)
Except, it apparently isn't
Here is my actual Query:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS,
IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune,
IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical,
IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther,
IIf([DecReason]=7,1,0) AS YesAlready
FROM
EmployeeInformation
INNER JOIN (CompletedTrainings
LEFT JOIN DeclinationReasons ON CompletedTrainings.DecReason = DeclinationReasons.ReasonID)
ON EmployeeInformation.ID = CompletedTrainings.Employee
GROUP BY
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No"),
IIf([DecReason]=1,1,0),
IIf([DecReason]=2,1,0),
IIf([DecReason]=3,1,0),
IIf([DecReason]=4,1,0),
IIf([DecReason]=5,1,0),
IIf([DecReason]=6,1,0),
IIf([DecReason]=7,1,0)
HAVING
((((EmployeeInformation.Active) Like -1)
AND ((CompletedTrainings.DecShotDate + 365 >= DATE())
OR (CompletedTrainings.DecShotDate IS NULL))));
This is Joining a few tables (obviously) in order to get a number of records. The problem is that if someone is duplicated on the table with a NULL in one of the date fields, and a date in another field, it pulls both the NULL and the DATE, or pulls multiple NULLS it might pull multiple dates but those are not present right at the moment.
I need the Nulls, they are actual data in this particular case, but if someone has a date and a NULL I need to pull only the newest record, I thought I could add MAX(RecordID) from the table, but that didn't change the results of the query either.
That code:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
MAX(CompletedTrainings.RecordID),
CompletedTrainings.DecShotDate
...
And it returned the same issue, Duplicated EmployeeInformation.ID with different DecShotDate values.
Currently it returns:
ID
Active
DecShotDate
etc. x a bunch
1
-1
date date
whatever goes
2
-1
in these
2
-1
date date
columns
These are being used in a report, that is to determine the total number of employees who fit the criteria of the report. The NULLs in DecShotDate are needed as they show people who did not refuse to get a flu vaccine in the current year, while the dates are people who did refuse.
Now I have come up with one simple solution, I could add a column to the CompletedTrainings Table that contains a date or other value, and add that to the HAVING statement. This might be the right solution as this is a yearly training questionnaire that employees have to fill out. But I am asking for advice before doing this.
Am I right in thinking I need to add a column to filter by so that older data isn't being pulled, or should I be able to do this by pulling recordID, and did I just bork that part of the query up?
Edited to add raw table views:
EmployeeInformation Table:
ID
Last
First
empID
Active
Termdate
DoH
Title
PT/FT/PD
PI
1
Doe
Jane
982
-1
date
Sr
PD
X
2
Roe
John
278
0
date
date
Jr
PD
X
3
Moe
Larry
1232
-1
date
Sr
FT
X
4
Zoe
Debbie
1424
-1
date
Sr
PT
X
DeclinationReasons Table:
ReasonID
Reason
1
Allergy
2
Already got it
3
Illness
CompletedTrainings Table:
RecordID
Employee
Training
...
DecShotdate
DecShotLocation
DecShotReason
DecExp
1
1
4
date
location
2
text
2
1
4
3
2
4
4
3
4
date
location
3
text
5
3
4
date
location
1
text
6
4
4
After some serious soul searching, I decided to use another column and filter by that.
In the end my query looks like this:
SELECT *
FROM (
(
SELECT RecordID, DecShotDate, DecShotLocation, DecReason, DecExplanation, Employee,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS, IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune, IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical, IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther, IIf([DecReason]=7,1,0) AS YesAlready
FROM CompletedTrainings WHERE (CompletedDate > DATE() - 365 ) AND (Training = 69)) AS T1
LEFT JOIN
(
SELECT ID, Active FROM EmployeeInformation) AS T2 ON T1.Employee = T2.ID)
LEFT JOIN
(
SELECT Reason, ReasonID FROM DeclinationReasons) AS T3 ON T1.DecReason = T3.ReasonID;
This may not have been the best solution, but it did exactly what I needed. Which is to get the information by latest entry into the database.
Previously I had tried to use MAX(), DISTINCT(), etc. but always had a problem of multiple records being retrieved. In this case, I intentionally SELECT the most recent records first, then join them to the results of the next query, and so on. Until I have all the required data for my report.
I write this in hopes someone else finds it useful. Or even better if someone tells me why this is wrong, so as to improve my own skills.

Repetition of weekend's pattern after every 7 days in SQL

I have a requirement to repeat the Week-Off pattern in table2 based on given week-off frequency in table1 .
Frequency: means it could be a number with multiple of 7 only like (7 , 14 , 21 ,28 so on..)
Week-Off: for each employee week-off could be Nth rows.
Please find the SQL fiddle for demo
http://sqlfiddle.com/#!18/7cb68a/2
In given screenshot, If you have noticed "WhatIsGetting" field then it's working only for two week-off after that it's getting null bcz RuleTableTemp.ShiftId is not matching with TempMainTable.ShiftId.
I need a experts help on my requirement to repeat the week-off for given date range based on given RuleTableTemp.WeekOffFrequencyInDays.
For now in demo I have used '7' as a hard-coded value for week-off frequency like this
((te.Id / 7) + 1 )
Please find the screen shot for more clarifications.
Feel free to ask if any information is misleading or not cleared.
Note: For now I have taken only example of one employee in real scenario it could be nth employees and week-off should repeat for each employee for given date range based on given week-off frequency...
Conditions or points to remember:
RuleTableTemp: for now we have two ShiftPattern for each employee but it may vary, it could be 3 or 4 patterns too.
RuleTableTemp Filed's name WeekOffFrequencyInDays having value '7' for EmployeeId(4536) but it can also vary for each employee and yes for eg. if there are 4 entries for "4536" employee then WeekOffFrequencyInDays value will be same for all.
Example 1:
if (RuleTableTemp.WeekOffFrequencyInDays == 7 ) {
// ShiftPattern's count is 2:
// ShiftPattern will switch after every 7 days.
}
Example 2:
if ( RuleTableTemp.WeekOffFrequencyInDays == 14) {
// if ShiftPattern's count is 3:
// ShiftPattern will keep switching between 3 patterns after every 14 days
}
Example 3:
if ( RuleTableTemp.WeekOffFrequencyInDays == 21) {
// if ShiftPattern's count is 1 means no switching is required
}
I took almost 1 hour to explain my requirements but somebody still down-vote it instead of asking what was uncleared... so said to see this.... :(
This answers the original version of the question.
This logic should match the shifts:
SELECT mt.* ,
(SELECT rtt.ShiftPattern
FROM RuleTableTemp rtt
WHERE rtt.EmployeeID = mt.EmployeeId AND
rtt.id = (((mt.seqnum - 1) % 14) / 7) + 1
) as Sh
FROM (SELECT mt.*,
ROW_NUMBER() OVER (PARTITION BY mt.EmployeeId ORDER BY id) as seqnum
FROM TempMainTable mt
) mt;
Note that I added an explicit sequence number on the main table. This is just to be sure that the numbers are doing what you really want (automatically generated ids can be a problem).
The key to the logic is modulo arithmetic -- taking the remainder when the sequence number of divided by 14 and then using that for matching to the week.
Here is a db<>fiddle.

SQL Server query order by sequence serie

I am writing a query and I want it to do a order by a series. The first seven records should be ordered by 1,2,3,4,5,6 and 7. And then it should start all over.
I have tried over partition, last_value but I cant figure it out.
This is the SQL code:
set language swedish;
select
tblridgruppevent.id,
datepart(dw,date) as daynumber,
tblRidgrupper.name
from
tblRidgruppEvent
join
tblRidgrupper on tblRidgrupper.id = tblRidgruppEvent.ridgruppid
where
ridgruppid in (select id from tblRidgrupper
where corporationID = 309 and Removeddate is null)
and tblridgruppevent.terminID = (select id from tblTermin
where corporationID = 309 and removedDate is null and isActive = 1)
and tblridgrupper.removeddate is null
order by
datepart(dw, date)
and this is a example the result:
5887 1 J2
5916 1 J5
6555 2 Junior nybörjare
6004 2 Morgonridning
5911 3 J2
6467 3 J5
and this is what I would expect:
5887 1 J2
6555 2 Junior nybörjare
5911 3 J2
5916 1 J5
6004 2 Morgonridning
6467 3 J5
You might get some value by zooming out a little further and consider what you're trying to do and how else you might do it. SQL tends to perform very poorly with row by row processing as well as operations where a row borrows details from the row before it. You also could run into problems if you need to change what range you repeat at (switching from 7 to 10 or 4 etc).
If you need a number there somewhat arbitrarily still, you could add ROW_NUMBER combined with a modulo to get a repeating increment, then add it to your select/where criteria. It would look something like this:
((ROW_NUMBER() OVER(ORDER BY column ASC) -1) % 7) + 1 AS Number
The outer +1 is to display the results as 1-7 instead of 0-6, and the inner -1 deals with the off by one issue (the column starting at 2 instead of 1). I feel like there's a better way to deal with that, but it's not coming to me at the moment.
edit: Looking over your post again, it looks like you're dealing with days of the week. You can order by Date even if it's not shown in the select statement, that might be all you need to get this working.
The first seven records should be ordererd by 1,2,3,4,5,6 and 7. And then it should start all over.
You can use row_number():
order by row_number() over (partition by DATEPART(dw, date) order by tblridgruppevent.id),
datepart(dw, date)
The second key keeps the order within a group.
You don't specify how the rows should be chosen for each group. It is not clear from the question.

SQL MS Access just show 2 Data with same ID

my problem is just to show two rows of Data with same ID.
My Table looks like this:
------------------------------
- FlatDestID - Trefferspalte -
- 444555666 - K -
- 444555666 - 1 -
- 444555666 - 1 -
- 111222333 - K -
- 111222333 - 1 -
- 111222333 - 1 -
------------------------------
And i want to have my Table like this
------------------------------
- FlatDestID - Trefferspalte -
- 444555666 - K -
- 444555666 - 1 -
- 111222333 - K -
- 111222333 - 1 -
------------------------------
Sometimes i have the same ID four to five times. And i just want to show the first two Data for each FlatDestId like in the second example. That means if you have 5 times the same FlatDestId then just show the FlatDestId with "Trefferspalte = K" and then a secon Data with the same FlatDestId and "Trefferspalte = 1"
I hope you can understand my Question and the problem i need Solution for.
Greatings from Germany
---------- EDIT ----------
I have for "Trefferspalte" the Values K and 1-6 and i want to see always K and then another FlatDestId with 1 or 2 or 3 or 4 or 5 or 6!
The Solution DISTINCT show's FlatDestId and Trefferspalte with all the Values that Exist for the FlatDestId and not K and just any Trefferspalte. I need to have two Values.
Access doesnt like much nested SQL queries, so first create a Query with this SQL :
SELECT K.FlatDestID, T.AnyTrefferspalte
FROM
(
SELECT DISTINCT FlatDestID
FROM <yourTable>
WHERE Trefferspalte = 'K'
) K
INNER JOIN
(
SELECT FlatDestID, MIN(Trefferspalte) AS AnyTrefferspalte
FROM <yourTable>
WHERE Trefferspalte <> 'K'
GROUP BY FlatDestID
) T
ON K.FlatDestID = T.FlatDestID
The goal is to know which are the FlatDestID that have both a 'K' Trefferspalte row and any other row. If there are multiple non-K Trefferspalte, here with MIN(Trefferspalte), I have chosen to keep their minimum value. You can change it to MAX() to keep the higher value, or to FIRST() to keep the first one encountered, which in fact means random.
Name your query like you want, I have chosen TempQuery
Then, this query should give you the expected results:
SELECT FlatDestID, 'K' AS Trefferspalte FROM TempQuery
UNION ALL
SELECT FlatDestID, AnyTrefferspalte AS Trefferspalte FROM TempQuery
ORDER BY FlatDestID
On a side note: your table structure is weird and needs a serious redesign that would avoid you a lot of headaches. A Primary Key would be a good start.
Make use of DISTINCT to remove duplicates, this will show your 2 rows. When you have more than 2 different values for Trefferspalte you will have to use some other method.
SELECT DISTINCT FlastDestID, Trefferspalte
FROM <yourTable>
You may want to make both columns your primary keys so that it'll thrown an error if you receive duplicate values (so you won't save it in the table). If that's the existing setup and you can't do anything about it, try DISTINCT. If it won't work, then try grouping it (you just need to add a new column in your query:
SELECT FlatDestID, Trefferspalte, COUNT(*) FROM YourTable GROUP BY FlatDestID, Trefferspalte

Efficiently identify all FK items with n>3 dates within any 8 week period from a SQL table?

I have a ~400,000 row table containing the dates at which a collection of ~30,000 people had appointments. Each row has the patient ID number and an appointment date. I want to efficiently select people who had at least 4 appointments in an 8 week span. Ideally, I would also flag the appointments that were within this 8 week span as I did so. I am working in a server environment that does not allow CLR aggregate functions. Is this possible to do in SQL server? If so, how?
What I've thought about:
If I could write my own aggregate function to do this via GROUP BY that would obviously be best - but I can't seem to find any way to do it with the built in aggregate functions.
I can add a column to my original table giving a date 8 weeks out from any given appointment, but can't come up with any way that doesn't involve a for loop to then ask the question row by row whether there are at least 3 other appointments within that window.
Finally, I've even though that perhaps I could just do GROUP BY but somehow create 100 new columns (as there are up to that many appointments for some patients) to create a table that contains every appointment indexed by patient, but even as a SQL newbie I'm pretty sure that as soon as I get to the point of imagining adding 100 new columns I'm going down the wrong road....
For clarity of discussion, here is some notation:
MyTable:
ApptID PatientID ApptDate (in smalldatetime)
--------------------------------------------------
Apt1 Pt1 Datetime1
Apt2 Pt1 Datetime2
Apt3 Pt2 Datetime3
... ... ...
Desired output (one option):
PatientID 4aptsIn8weeks? (Boolean) InitialApptDateForWin
Pt1 1 Datetime1
Pt2 0 NULL
Pt3 1 Datetime3
...
Desired output (another option):
ApptID PatientID ApptDate InAn8wkWindow? InitialApptDateForWin
Apt1 Pt1 Datetime1 1 Datetime1
Apt2 Pt1 Datetime2 1 Datetime1
Apt3 Pt2 Datetime3 0 NULL
... ... ...
But really, any output format that will in the end let me select patients and appointments that meet this criterion would be dandy....
Thanks for any ideas!
EDIT: Here's a slightly decompressed outline of my implementation of the selected answer below, just in case the details are helpful for anyone else (being new to SQL, it took me a couple stabs to get it working):
WITH MyTableAlias AS (
SELECT * FROM MyTable
)
SELECT MyTableAlias.PatientID, MyTable.Apptdate AS V1,
MyTableAlias.Apptdate AS V2
INTO temp1
FROM MyTable INNER JOIN MyTableAlias
ON (
MyTable.PatientID = MyTableAlia.PatientID
AND (DATEDIFF(Wk,MyTable.Apptdate,MyTableAlias.Apptdate) <=8 )
);
-- Since this gives for any given two visit dates 3 hits
-- (V1-V1, V1-V2, V2-V2), delete the ones where the second visit is being
-- selected as V1:
DELETE FROM temp1
WHERE V2<V1;
-- So far we have just selected pairs of visits within an 8 week
-- span of each other, including an entry for each visit being
-- within 8 weeks of itself, but for the rest only including the item
-- where the second visit is after the first. Now we want to look
-- for examples of first visits where there are at least 4 hits:
SELECT PatientID, V1, MAX(V2) AS lastvisitinspan, DATEDIFF(Wk,V1,MAX(V2))
AS nWeeksInSpan, COUNT(*) AS nWeeksInSpan
INTO MyOutputTable
FROM temp
GROUP BY PatientID, V1
HAVING COUNT(*)>3;
-- From here on it's just a matter of how I want to handle patients with two
-- separate V1 examples meeting criteria...
Rough outline of the query:
INNER JOIN the table ("table") with itself ("alias"), the ON clause would be:
table.patientid = alias.patientid
table.appointment_date < alias.appointment_date
datediff(table.appointment_date, alias.appointment_date) <= 8 week
Then GROUP BY table.patientid, table.appointment_date
Output table.patientid, table.appointment_date, MAX(alias.appointment_date), COUNT(*)
Add a HAVING COUNT(*) > n clause
There are some issues though:
With 400,000 rows the JOIN could produce a very large result set
It will count some date ranges twice. E.g. if there were 4 visits in 9 week period then it will return two rows (#1, #2, #3 and #2, #3, #4).