I have a problem with duplicate records in a SQL Server 2014 database.
Users get a small postcard with a parcel number printed on them.
The postcard also shows a link to a simple form that they can use, to register their parcel.
The form unfortunately does not have any type of validation, to ensure that the same parcel does not get submitted more than once.
I currently have no control on the web form, and I am not sure how long will take for the responsible team to implement validation on it.
So I have to come up with a routine to deactivate the duplicate records, and keep only one.
This has to be a query that process a bulk of records, no tokens passed to the routine.
When the web form gets submitted, it creates a record id in sequential order, and assigns an application status of "Registered'.
I think that the way to correct this, would be to take highest record id value per parcel, and that would be the one to keep, the rest, will have to be deactivated.
Deactivate the non most recent records putting a rec_status of "I"
Set APPLICATION_STATUS to 'Closed' to the non most recent records
The query I use, returns 4 columns: Record Id, Parcel Number, Record Status, and Application Status
SELECT
B.[RECORD_ID],
B.[PARCEL_NBR],
B.[RECORD_STATUS], -- The value of this column would be "I" for the duplicate records.
B.[APPLICATION_STATUS]
FROM
A_TABLE A
INNER JOIN B_TABLE B
ON A.PARCEL_NBR = B.PARCEL_NBR
AND (A.APPLICATION_STATUS IS NULL
OR B.APPLICATION_STATUS = 'Registered');
Initial Output:
RECORD_ID PARCEL_NBR RECORD_STATUS APPLICATION_STATUS
REC-00081 0608012098 A Registered
REC-00082 0608012098 A Registered
REC-00083 0608012098 A Registered
Expected Output:
RECORD_ID PARCEL_NBR RECORD_STATUS APPLICATION_STATUS
REC-00081 0608012098 I Closed - this record got updated
REC-00082 0608012098 I Closed - this record got updated
REC-00083 0608012098 A Registered
I think that perhaps a cursor might be part of the solution? Honestly I am not sure. I kindly ask for your help.
You can use window functions and case logic:
SELECT B.[RECORD_ID], B.[PARCEL_NBR],
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY B.PARCEL_NBR ORDER BY B.RECORD_ID DESC) > 1
THEN 'I' ELSE B.[RECORD_STATUS]
END) as RECORD_STATUS,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY B.PARCEL_NBR ORDER BY B.RECORD_ID DESC) > 1
THEN Closed - this record got updated ELSE B.APPLICATION_STATUS
END) as APPLICATION_STATUS,
B.[]
FROM A_TABLE A JOIN
B_TABLE B
ON A.PARCEL_NBR = B.PARCEL_NBR AND
(A.APPLICATION_STATUS IS NULL OR B.APPLICATION_STATUS = 'Registered');
I'm not sure what role A_TABLE plays in this, but this may give you what you want:
update B_TABLE
set record_Status = 'I'
, application_status = 'Closed - this record got updated'
where record_status = 'A'
and application_status = 'Registered'
and record_id <> (select max(record_id)
from B_TABLE b
where b.parcel_nbr = B_TABLE.parcel_nbr
and b.record_status = 'A'
and b.application_status = 'Registered');
Related
I have two tables I'm trying to conditionally JOIN.
dbo.Users looks like this:
UserID
------
24525
5425
7676
dbo.TelemarketingCallAudits looks like this (date format dd/mm/yyyy):
UserID Date CampaignID
------ ---------- ----------
24525 21/01/2018 1
24525 26/08/2018 1
24525 17/02/2018 1
24525 12/01/2017 2
5425 22/01/2018 1
7676 16/11/2017 2
I'd like to return a table that contains ONLY users that I called at least 30 days ago (if CampaignID=1) and at least 70 days ago (if CampaignID=2).
The end result should look like this (today is 02/09/18):
UserID Date CampaignID
------ ---------- ----------
5425 22/01/2018 1
7676 16/11/2017 2
Note that because I called user 24524 with Campaign 1 only 7 days ago, I shall not see the user at all.
I tried this simple AND/OR condition and then I found out it will still return the users I shouldn't see because they do have rows indicating other calls and it simply ignoring the conditioned calls... which misses the goal obviously.
I have no idea on how to condition the overall appearance of the user if ANY of his associated rows in the second table did not meet the condition.
AND
(
internal_TelemarketingCallAudits.CallAuditID IS NULL --No telemarketing calls is fine
OR
(
internal_TelemarketingCallAudits.CampaignID = 1 --Campaign 1
AND
DATEADD(dd, 75, MAX(internal_TelemarketingCallAudits.Date)) < GETDATE() --Last call occured at least 10 days ago
)
OR
(
internal_TelemarketingCallAudits.CampaignID != 1 --Other campaigns
AND
DATEADD(dd, 10, MAX(internal_TelemarketingCallAudits.Date)) < GETDATE() --Last call occured at least 10 days ago
)
)
I really appreciate your help.
Try this: SQL Fiddle
select *
from dbo.Users u
inner join ( --get the most recent call per user (taking into account different campaign timescales)
select tca.UserId
, tca.CampaignId
, tca.[Date]
, case when DateAdd(Day,c.DaysSinceLastCall, tca.[Date]) > getutcdate() then 1 else 0 end LastCalledInWindow
, row_number() over (partition by tca.UserId order by case when DateAdd(Day,c.DaysSinceLastCall, tca.[Date]) > getutcdate() then 1 else 0 end desc, tca.[Date] desc) r
from dbo.TelemarketingCallAudits tca
inner join (
values (1, 60)
, (2, 70)
) c (CampaignId, DaysSinceLastCall)
on tca.CampaignId = c.CampaignId
) mrc
on mrc.UserId = u.UserId
and mrc.r = 1 --only accept the most recent call
and mrc.LastCalledInWindow = 0 --only include if they haven't been contacted in the last x days
I'm not comparing all rows here; but rather saw that you're interested in when the most recent call is; then you only care if that's in the X day window. There's a bit of additional complexity given the X days varies by campaign; so it's not the most recent call you care about so much as the most likely to fall within that window. To get around that, I sort each users' calls by those which are in the window first followed by those which aren't; then sort by most recent first within those 2 groups. This gives me the field r.
By filtering on r = 1 for each user, we only get the most recent call (adjusted for campaign windows). By filtering on LastCalledInWindow = 0 we exclude those who have been called within the campaign's window.
NB: I've used an inner query (aliased c) to hold the campaign ids and their corresponding windows. In reality you'd probably want a campaigns table holding that same information instead of coding inside the query itself.
Hopefully everything else is self-explanatory; but give me a nudge in the comments if you need any further information.
UPDATE
Just realised you'd also said "no calls is fine"... Here's a tweaked version to allow for scenarios where the person has not been called.
SQL Fiddle Example.
select *
from dbo.Users u
left outer join ( --get the most recent call per user (taking into account different campaign timescales)
select tca.UserId
, tca.CampaignId
, tca.[Date]
, case when DateAdd(Day,c.DaysSinceLastCall, tca.[Date]) > getutcdate() then 1 else 0 end LastCalledInWindow
, row_number() over (partition by tca.UserId order by case when DateAdd(Day,c.DaysSinceLastCall, tca.[Date]) > getutcdate() then 1 else 0 end desc, tca.[Date] desc) r
from dbo.TelemarketingCallAudits tca
inner join (
values (1, 60)
, (2, 70)
) c (CampaignId, DaysSinceLastCall)
on tca.CampaignId = c.CampaignId
) mrc
on mrc.UserId = u.UserId
where
(
mrc.r = 1 --only accept the most recent call
and mrc.LastCalledInWindow = 0 --only include if they haven't been contacted in the last x days
)
or mrc.r is null --no calls at all
Update: Including a default campaign offset
To include a default, you could do something like the code below (SQL Fiddle Example). Here, I've put each campaign's offset value in the Campaigns table, but created a default campaign with ID = -1 to handle anything for which there is no offset defined. I use a left join between the audit table and the campaigns table so that we get all records from the audit table, regardless of whether there's a campaign defined, then a cross join to get the default campaign. Finally, I use a coalesce to say "if the campaign isn't defined, use the default campaign".
select *
from dbo.Users u
left outer join ( --get the most recent call per user (taking into account different campaign timescales)
select tca.UserId
, tca.CampaignId
, tca.[Date]
, case when DateAdd(Day,coalesce(c.DaysSinceLastCall,dflt.DaysSinceLastCall), tca.[Date]) > getutcdate() then 1 else 0 end LastCalledInWindow
, row_number() over (partition by tca.UserId order by case when DateAdd(Day,coalesce(c.DaysSinceLastCall,dflt.DaysSinceLastCall), tca.[Date]) > getutcdate() then 1 else 0 end desc, tca.[Date] desc) r
from dbo.TelemarketingCallAudits tca
left outer join Campaigns c
on tca.CampaignId = c.CampaignId
cross join Campaigns dflt
where dflt.CampaignId = -1
) mrc
on mrc.UserId = u.UserId
where
(
mrc.r = 1 --only accept the most recent call
and mrc.LastCalledInWindow = 0 --only include if they haven't been contacted in the last x days
)
or mrc.r is null --no calls at all
That said, I'd recommend not using a default, but rather ensuring that every campaign has an offset defined. i.e. Presumably you already have a campaigns table; and since this offset value is defined per campaign, you can include a field in that table for holding this offset. Rather than leaving this as null for some records, you could set it to your default value; thus simplifying the logic / avoiding potential issues elsewhere where that value may subsequently be used.
You'd also asked about the order by clause. There is no order by 1/0; so I assume that's a typo. Rather the full statement is row_number() over (partition by tca.UserId order by case when DateAdd(Day,coalesce(c.DaysSinceLastCall,dflt.DaysSinceLastCall), tca.[Date]) > getutcdate() then 1 else 0 end desc, tca.[Date] desc) r.
The purpose of this piece is to find the "most important" call for each user. By "most important" I basically mean the most recent, since that's generally what we're after; though there's one caveat. If a user is part of 2 campaigns, one with an offset of 30 days and one with an offset of 60 days, they may have had 2 calls, one 32 days ago and one 38 days ago. Though the call from 32 days ago is more recent, if that's on the campaign with the 30 day offset it's outside the window, whilst the older call from 38 days ago may be on the campaign with an offset of 60 days, meaning that it's within the window, so is more of interest (i.e. this user has been called within a campaign window).
Given the above requirement, here's how this code meets it:
row_number() produces a number from 1, counting up, for each row in the (sub)query's results. The counter is reset to 1 for each partition
partition by tca.UserId says that we're partitioning by the user id; so for each user there will be 1 row for which row_number() returns 1, then for each additional row for that user there will be a consecutive number returned.
The order by part of this statement defines which of each users' rows gets #1, then how the numbers progress thereafter; i.e. the first row according to the order by gets number 1, the next number 2, etc.
case when DateAdd(Day,coalesce(c.DaysSinceLastCall,dflt.DaysSinceLastCall), tca.[Date]) > getutcdate() then 1 else 0 end returns 1 for calls within their campaign's window, and 0 for those outside of the window. Since we're ordering by this result in ascending order, that says that any records within their campaign's window should be returned before any outside of their campaign's window.
we then order by tca.[Date] desc; i.e. the more recent calls are returned before the later calls.
finally, we name the output of this row number as r and in the outer query filter on r = 1; meaning that for each user we only take one row, and that's the first row according to the order criteria above; i.e. if there's a row in its campaign's window we take that, after which it's whichever call was most recent (within those in the window if there were any; then outside that window if there weren't).
Take a look at the output of the subquery to get a better idea of exactly how this works: SQL Fiddle
I hope that explanation makes some sense / helps you to understand the code? Sadly I can't find a way to explain it more concisely than the code itself does; so if it doesn't make sense try playing with the code and seeing how that affects the output to see if that helps your understanding.
I have a system that requests information by sending 3 parameters to an external system: user, start_date and end_date.
I have a table
request (
id,
user,
start_date,
end_date,
status
)
that logs these requests and their status (Done for the requests that have returned, Waiting for the requests that havent yet returned).
Every few hours I will resubmit the requests that havent yet returned, even though the initial request could still return some time in the future.
After some time, my table will have multiple requests for the same user/start_date/end_date, some of them Waiting, some Done.
What I need is a query that returns a list of ids of all duplicate requests with the exception of 1 Done, where at least one request has status=Done.
In summary, I need a way to clear the exceeding requests for a given user/start_date/end_date, if at least one of them has status=Done (doesnt matter which one, I just need to keep 1 status = Done for a given user/start_date/end_date).
So far I've been able to pinpoint the duplicate requests that have at least 1 Done. To select all but one complete from this query, I would most likely wrap this entire query into 2 more selects and do the magic, but the query as is, is already really slow. Can someone help me refactor it and select the end result i need?
http://sqlfiddle.com/#!5/10c25a/1
I'm using SQLite
The expected result from the dataset provided in the sqlfiddle is this:
454, 457, 603, (604 or 605 not both), 607, 608
select r.id from request r inner join (
select user, start_date, end_date,
min(case when status = 'Done' then id end) as keep_id
from request
group by user, start_date, end_date
having count(case when status = 'Done' then 1 end) > 0 and count(*) > 1
) s on s.user = r.user and s.start_date = r.start_date and s.end_date = r.end_date
and s.keep_id <> r.id
What you're after are records that match this criteria...
There exists another record with Status "Done"
That other "Done" record matches user, start_date and end_date
That other record has a lower id value (because you need something to identify the record to keep) or the other record has a higher id but the record you're looking at has Status "Waiting"
With all that in mind, here's your query
SELECT id FROM request r1
WHERE EXISTS (
SELECT 1 FROM request r2
WHERE r2.Status = 'Done'
AND r1.user = r2.user
AND r1.start_date = r2.start_date
AND r1.end_date = r2.end_date
AND (r1.id > r2.id OR r1.Status = 'Waiting')
)
ORDER BY id
http://sqlfiddle.com/#!5/10c25a/26 ~ produces IDs 454, 457, 603, 605, 607 and 608
I have created a database of "trips" with the following
TripParent
Id
DrivingCompany
Client
TripDetails
Id
Destination
PlannedArrivalDate
StatusLog
Id
CatStatusId (Comes from another table just with the names)
DateTimeModified
Let me explain the tables, first of all I hid another fields to keep it simple, the "Parent" table has MANY TripDetails, so it is just a summary for the many "Details" it has. The TripDetails table its 1 row for 1 destination, let's say the Trip is going from A to C, then we have a Row for each "stop" (A, B, C).
And then we got the StatusLog table that has MANY Rows for each "TripDetails".
The problem is, I need a stored procedure that returns the DrivingCompany, Client, PlannedArrivalDate, RealArrivalDate and RealDepartureDate.
The "Real Dates" come from the StatusLog table. Status 1 means that the truck has arrived Destination (A/B/C) and the status 2 means that it has left said location.
So far I got the following
SELECT
TP.DrivingCompany, TP.Client, TD.PlannedArrivalDate,
'Real Arrival Date' = CASE SL.CatStatusId
WHEN 1 THEN SL.DateTimeModified
ELSE NULL
END,
'Real Departure Date' = CASE SL.CatStatusId
WHEN 2 THEN SL.DateTimeModified
ELSE NULL
END
FROM
TripParent TP
JOIN
TripDetails TD ON TD.TripParentId = TE.Id
JOIN
StatusLog SL ON SL.TripDetailsId = TD.Id
GROUP BY
TP.Id
ORDER BY
TD.Id
Is using the CASE the correct way to show the same column twice in the SELECT statement? I think I'm on the right track but I can't group by TP.Id and I also need to show ALL the rows, going by this query, It doesn't show the "TripDetails" that don't have a "StatusLog" row because they haven't arrived.
Any help is appreciated
Try this:
SELECT TP.Id,
TP.DrivingCompany,
TP.Client,
TD.PlannedArrivalDate,
'Real Arrival Date' = CASE SL.CatStatusId
WHEN 1 THEN SL.DateTimeModified
ELSE NULL
END,
'Real Departure Date' = CASE SL.CatStatusId
WHEN 2 THEN SL.DateTimeModified
ELSE NULL
END
FROM TripParent TP
JOIN TripDetails TD ON TD.TripParentId = TE.Id
LEFT JOIN StatusLog SL ON SL.TripDetailsId = TD.Id
GROUP BY TP.Id,
TP.DrivingCompany,
TP.Client,
TD.PlannedArrivalDate
ORDER BY TD.Id
They way you use CASE if perfectly fine, yes. Columns you want to group by must be contained in the SELECT. Using a LEFT JOIN for the log entries ensures that you get also rows without corresponding log.
When grouping, each column you display must be either contained in the GROUP BY clause or in an aggregate function in the SELECT. So you have to think about why you are grouping, do you want to create a sum, count, ... ? >> Change the two CASE columns accordingly.
This is being run on sql for IBMI Series 7
I have a table which stores info about orders. Each row has an order number (ON), part number(PN), and sequence number(SEQ). Each ON will have multiple PN's linked to them and each part number has multiple SEQ Number. Each sequence number represents the order in which to do work on the part. Somewhere else in the system once the part is at a location and ready to be worked on it shows a flag. What I want to do is get a list of orders for a location that have not yet arrived but have been closed out on the previous location( Which means the part is on it's way).
I have a query listed below that I believe should work but I get the following error: "The column qualifier or table t undefined". Where is my issue at?
Select * From (SELECT M2ON as Order__Number , M2SEQ as Sequence__Number,
M2PN as Product__Number,ML2OQ as Order__Quantity
FROM M2P
WHERE M2pN in (select R1PN FROM R1P WHERE (RTWC = '7411') AND (R1SEQ = M2SEQ)
)
AND M2ON IN (SELECT M1ON FROM M1P WHERE ML1RCF = '')
ORDER BY ML2OSM ASC) as T
WHERE
T.Order__Number in (Select t3.m2on from (SELECT *
FROM(Select * from m2p
where m2on = t.Order__Number and m2pn = t.Product__Number
order by m2seq asc fetch first 2 rows only
)as t1 order by m2seq asc fetch first row only
) as t3 where t3.m2stat = 'C')
EDIT- Answer for anyone else with this issue
Clutton's Answer worked with slight modification so thank you to him for the fast response! I had to name my outer table and specify that in the subquery otherwise the as400 would kick back and tell me it couldn't find the columns. I also had to order by the sequence number descending so that I grabbed the highest record that was below the parameter(otherwise for example if my sequence number was 20 it could grab 5 even though 10 was available and should be shown first. Here is the subquery I now use. Please note the actual query names m2p as T1.
IFNULL((
SELECT
M2STAT
FROM
M2P as M2P_1
WHERE
M2ON = T1.M2ON
AND M2SEQ < T1.M2SEQ
AND M2PN IN (select R1PN FROM R1P WHERE (RTWC = #WC) AND (R1SEQ = T1.M2SEQ))
ORDER BY M2SEQ DESC
FETCH FIRST ROW ONLY
), 'NULL') as PRIOR_M2STAT
Just reading your question, it looks like something I do frequently to emulate RPG READPE op codes. Is the key to M2P Order/Seq? If so, here is a basic piece that may help you build out the rest of the query.
I am assuming that you are trying to get the prior record by key using SQL. In RPG this would be like doing a READPE on the key for a file with Order/Seq key.
Here is an example using a subquery to get the status field of the prior record.
SELECT
M2ON, M2PN, M2OQ, M2STAT,
IFNULL((
SELECT
M2STAT
FROM
M2P as M2P_1
WHERE
M2P_1.M2ON = M2ON
AND M2P_1.M2SEQ < M2SEQ
FETCH FIRST ROW ONLY
), '') as PRIOR_M2STAT
FROM
M2P
Note that this wraps the subquery in an IFNULL to handle the case where it is the first sequence number and no prior sequence exists.
update Room set Status = case
when Room_Rev.In_DateTime IS NOT NULL and Room_Rev.Out_DateTime IS NULL
then 'U'
when Room_Rev.In_DateTime IS NOT NULL and Room_Rev.Out_DateTime IS NOT NULL
then 'A'
when Room.Status!='R' and Room.Status!='U' and Room.Status!='A'
then Room.Status
else 'R'
end
FROM Room JOIN Room_Rev
ON Room.Room_ID=Room_Rev.Room_ID
and
((Room_Rev.Start_Date >= '2015-03-22' and Room_Rev.End_Date <= '2015-03-22')
OR
(Room_Rev.Start_Date<= '2015-03-22' and Room_Rev.End_Date> '2015-03-22')
OR
(Room_Rev.Start_Date< '2015-03-22' and Room_Rev.End_Date>= '2015-03-22'))
How to add order by Rev_ID desc in the query?
There are two table which is Room and Room_Rev,
they are one to many relationship
The last two row ROM0006 already fill the In_DateTime and Out_DateTime,
thus it regard check out,
and the last row insert new reservation,
the In_DateTime is null
thus i need the query return 'R' (Reserved status)
As one of the possible solutions I suggest a nested query instead of a join in UPDATE statement. The logic of the update is not completely clear to me, so I leave the final update for OP to correct order of sorting (Note I used top 1 and order by room_ID in the nested SELECT statement). However, this approach allows to handle all usual techniques applicable for a SELECT.
update Room set Status = (select TOP 1 case
when Room_Rev.In_DateTime IS NOT NULL and Room_Rev.Out_DateTime IS NULL
then 'U'
when Room_Rev.In_DateTime IS NOT NULL and Room_Rev.Out_DateTime IS NOT NULL
then 'A'
when Room.Status!='R' and Room.Status!='U' and Room.Status!='A'
then Room.Status
else 'R'
end
FROM Room_Rev
WHERE Room.Room_ID=Room_Rev.Room_ID
and
((Room_Rev.Start_Date >= '2015-03-22' and Room_Rev.End_Date <= '2015-03-22')
OR
(Room_Rev.Start_Date<= '2015-03-22' and Room_Rev.End_Date> '2015-03-22')
OR
(Room_Rev.Start_Date< '2015-03-22' and Room_Rev.End_Date>= '2015-03-22'))
ORDER BY Room_Rev.Room_Id
)
PS. As a piece of advise I still assume that such approach is not valid. It prevents proper normalization of data. You'd rather have this information always queried dynamically when required, instead of writing static value to ROOM.status