SQL remove duplicates from results

SQL remove duplicates from results - sql

select DAC.LocationCode, DAC.Description, ReqApp.Rank, App.Approver as UserName,
CASE WHEN app.Approver = app.AlternateApprover THEN ''
ELSE AltApp.AlternateApprover END As AltApprover,
ISNULL(CONVERT(Varchar,AltApp.FromDate,101),'')AS FromDate,
ISNULL(CONVERT(Varchar,AltApp.ToDate,101),'')AS ToDate
from tblAPAlternateApprovers App
INNER JOIN tblAPAlternateApprovers AltApp
ON App.ID = AltApp.ID
INNER JOIN tblAPReqLocations DAC
ON App.tblAPReqLocationsID = DAC.ID
INNER JOIN tblAPReqApprover ReqApp
ON App.Approver = ReqApp.Approver AND
App.tblAPReqLocationsID = ReqApp.LocationID
ORDER BY DAC.LocationCode ASC, ReqApp.Rank asc
Output
When SQL Adds an 'alternate approver' (for purchase orders), it creates an additional record for the actual approver. So, trying to find a way to show only 1 record for those approvers that also have alternates. i.e. 'jlhayes' has 2 records. One with an alternate and one without. For these records, I want to only see the ones that have an alternate.Thank you for your help. I've spend a couple hours and out of ideas.

You can wrap AltApprover case statement in max(AltApprover) and group by DAC.LocationCode, DAC.Description, ReqApp.Rank, App.Approver and do similarly for FromDate and ToDate:
select DAC.LocationCode, DAC.Description, ReqApp.Rank, App.Approver as UserName,
max(CASE WHEN app.Approver = app.AlternateApprover THEN ''
ELSE AltApp.AlternateApprover END) As AltApprover,
max(ISNULL(CONVERT(Varchar,AltApp.FromDate,101),'')) AS FromDate,
max(ISNULL(CONVERT(Varchar,AltApp.ToDate,101),'')) AS ToDate
from tblAPAlternateApprovers App
INNER JOIN tblAPAlternateApprovers AltApp
ON App.ID = AltApp.ID
INNER JOIN tblAPReqLocations DAC
ON App.tblAPReqLocationsID = DAC.ID
INNER JOIN tblAPReqApprover ReqApp
ON App.Approver = ReqApp.Approver AND
App.tblAPReqLocationsID = ReqApp.LocationID
GROUP BY DAC.LocationCode, DAC.Description, ReqApp.Rank, App.Approver
ORDER BY DAC.LocationCode ASC, ReqApp.Rank asc

Related

SQL nested grouping issue

Here's my query:
select
cast(ar.AudienceCreationDate as date) as AudienceDate,
Count(*) as [Count],
count(case when ar.Source = 'Contact' then ar.Id end) as PatientCount,
count(case when ar.Source = 'PatientContact' then ar.Id end) as PatientContactCount,
(
select
count(*)
from
_SMSMessageTracking sms
inner join
[CTT Preferences] pref on pref.ContactId = sms.SubscriberKey
where
sms.Name <> 'ky_ctt_join' and pref.Source = 'Patient'
) as PatientSMS,
(
select
count(*)
from
_SMSMessageTracking sms
inner join
[CTT Preferences] pref on pref.ContactId = sms.SubscriberKey
where
sms.Name <> 'ky_ctt_join' and pref.Source = 'PatientContact'
) as PatientContactSMS
from
Daily_Symptom_Check_Audience_Archive ar
group by
cast(ar.AudienceCreationDate as date)
And here's the result set it creates:
The issue I'm having is that the values in the rightmost two columns are the same across the board, for all records. This number represents the TOTAL, and not aggregated by day, as the other values indicate. I realize that I'm doing something wrong - what can I do to modify my query to effectively have a proper "grouping" on these last two columns just like all the other data in this table?

in the where clause of the nested query add sms.AudienceDate = ar.AudienceDate

Should a subquery on a join use tables from an outer query in the where clause?

I need to add a subquery to a join, because one payment can have more than one allotment, so I only need to account for the first match (where rownum = 1).
However, I'm not sure if adding pmt from the outer query to the subquery on the allotment join is best.
Should I be doing this differently in the event of performance hits, etc.. ?
SELECT
pmt.payment_uid,
alt.allotment_uid,
FROM
payment pmt
/* HERE: is the reference to pmt.pay_key and pmt.client_id
incorrect in the below subquery? */
INNER JOIN allotment alc ON alt.allotment_uid = (
SELECT
allotment_uid
FROM
allotment
WHERE
pay_key = pmt.pay_key
AND
pay_code = 'xyz'
AND
deleted = 'N'
AND
client_id = pmt.client_id
AND
ROWNUM = 1
)
WHERE
AND
pmt.deleted = 'N'
AND
pmt.date_paid >= TO_DATE('2017-07-01')
AND
pmt.date_paid < TO_DATE('2017-10-01') + 1;

It's difficult to identify the performance issue in your query without seeing an explain plan output. You query does seem to do an additional SELECT on the allotment for every record from the main query.
Here is a version which doesn't use correlated sub query. Obviously I haven't been able to test it. It does a simple join in and then filters all records except one of the allotments. Hope this helps.
WITH v_payment
AS
(
SELECT
pmt.payment_uid,
alt.allotment_uid,
ROW_NUMBER () OVER(PARTITION BY allotment_id) r_num
FROM
payment pmt JOIN allotment alt
ON (pmt.pay_key = alt.pay_key AND
pmt.client_id = alt.client_id)
WHERE pmt.deleted = 'N' AND
pmt.date_paid >= TO_DATE('2017-07-01') AND
pmt.date_paid < TO_DATE('2017-10-01') + 1 AND
alt.pay_code = 'xyz' AND
alt.deleted = 'N'
)
SELECT payment_uid,
allotment_uid
FROM v_payment
WHERE r_num = 1;
Let's know how this performs!

You can phrase the query that way. I would be more likely to do:
SELECT . . .
FROM payment p INNER JOIN
(SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY pay_key, client_id
ORDER BY allotment_uid
) as seqnum
FROM allotment a
WHERE pay_code = 'xyz' AND deleted = 'N'
) a
ON a.pay_key = p.pay_key AND a.client_id = p.client_id AND
seqnum = 1
WHERE p.deleted = 'N' AND
p.date_paid >= DATE '2017-07-01' AND
p.date_paid < (DATE '2017-10-01') + 1;

SQL Server / T-SQL : query optimization assistance

I have this QA logic that looks for errors into every AuditID within a RoomID to see if their AuditType were never marked Complete or if they have two complete statuses. Finally, it picks only the maximum AuditDate of the RoomIDs with errors to avoid showing multiple instances of the same RoomID, since there are many audits per room.
The issue is that the AUDIT table is very large and takes a long time to run. I was wondering if there is anyway to reach the same result faster.
Thank you in advance !
IF object_ID('tempdb..#AUDIT') is not null drop table #AUDIT
IF object_ID('tempdb..#ROOMS') is not null drop table #ROOMS
IF object_ID('tempdb..#COMPLETE') is not null drop table #COMPLETE
IF object_ID('tempdb..#FINALE') is not null drop table #FINALE
SELECT distinct
oc.HotelID, o.RoomID
INTO #ROOMS
FROM dbo.[rooms] o
LEFT OUTER JOIN dbo.[hotels] oc on o.HotelID = oc.HotelID
WHERE
o.[status] = '2'
AND o.orderType = '2'
SELECT
t.AuditID, t.RoomID, t.AuditDate, t.AuditType
INTO
#AUDIT
FROM
[dbo].[AUDIT] t
WHERE
t.RoomID IN (SELECT RoomID FROM #ROOMS)
SELECT
t1.RoomID, t3.AuditType, t3.AuditDate, t3.AuditID, t1.CompleteStatus
INTO
#COMPLETE
FROM
(SELECT
RoomID,
SUM(CASE WHEN AuditType = 'Complete' THEN 1 ELSE 0 END) AS CompleteStatus
FROM
#AUDIT
GROUP BY
RoomID) t1
INNER JOIN
#AUDIT t3 ON t1.RoomID = t3.RoomID
WHERE
t1.CompleteStatus = 0
OR t1.CompleteStatus > 1
SELECT
o.HotelID, o.RoomID,
a.AuditID, a.RoomID, a.AuditDate, a.AuditType, a.CompleteStatus,
c.ClientNum
INTO
#FINALE
FROM
#ROOMS O
LEFT OUTER JOIN
#COMPLETE a on o.RoomID = a.RoomID
LEFT OUTER JOIN
[dbo].[clients] c on o.clientNum = c.clientNum
SELECT
t.*,
Complete_Error_Status = CASE WHEN t.CompleteStatus = 0
THEN 'Not Complete'
WHEN t.CompleteStatus > 1
THEN 'Complete More Than Once'
END
FROM
#FINALE t
INNER JOIN
(SELECT
RoomID, MAX(AuditDate) AS MaxDate
FROM
#FINALE
GROUP BY
RoomID) tm ON t.RoomID = tm.RoomID AND t.AuditDate = tm.MaxDate

One section you could improve would be this one. See the inline comments.
SELECT
t1.RoomID, t3.AuditType, t3.AuditDate, t3.AuditID, t1.CompleteStatus
INTO
#COMPLETE
FROM
(SELECT
RoomID,
COUNT(1) AS CompleteStatus
-- Use the above along with the WHERE clause below
-- so that you are aggregating fewer records and
-- avoiding a CASE statement. Remove this next line.
--SUM(CASE WHEN AuditType = 'Complete' THEN 1 ELSE 0 END) AS CompleteStatus
FROM
#AUDIT
WHERE
AuditType = 'Complete'
GROUP BY
RoomID) t1
INNER JOIN
#AUDIT t3 ON t1.RoomID = t3.RoomID
WHERE
t1.CompleteStatus = 0
OR t1.CompleteStatus > 1

Just a thought. Streamline your code and your solution. you are not effectively filtering your datasets smaller so you continue to query the entire tables which is taking a lot of your resources and your temp tables are becoming full copies of those columns without the indexes (PK, FK, ++??) on the original table to take advantage of. This by no means is a perfect solution but it is an idea of how you can consolidate your logic and reduce your overall data set. Give it a try and see if it performs better for you.
Note this will return the last audit record for any room that has either not had an audit completed or completed more than once.
;WITH cte AS (
SELECT
o.RoomId
,o.clientNum
,a.AuditId
,a.AuditDate
,a.AuditType
,NumOfAuditsComplete = SUM(CASE WHEN a.AuditType = 'Complete' THEN 1 ELSE 0 END) OVER (PARTITION BY o.RoomId)
,RowNum = ROW_NUMBER() OVER (PARTITION BY o.RoomId ORDER BY a.AuditDate DESC)
FROm
dbo.Rooms o
LEFT JOIN dbo.Audit a
ON o.RoomId = a.RoomId
WHERE
o.[Status] = 2
AND o.OrderType = 2
)
SELECT
oc.HotelId
,cte.RoomId
,cte.AuditId
,cte.AuditDate
,cte.AuditType
,cte.NumOfAuditsComplete
,cte.clientNum
,Complete_Error_Status = CASE WHEN cte.NumOfAuditsComplete > 1 THEN 'Complete More Than Once' ELSE 'Not Complete' END
FROM
cte
LEFT JOIN dbo.Hotels oc
ON cte.HotelId = oc.HotelId
LEFT JOIN dbo.clients c
ON cte.clientNum = c.clientNum
WHERE
cte.RowNum = 1
AND cte.NumOfAuditsComplete != 1
Also note I changed your
WHERE
o.[status] = '2'
AND o.orderType = '2'
TO
WHERE
o.[status] = 2
AND o.orderType = 2
to be numeric without the single quotes. If the data type is truely varchar add them back but when you query a numeric column as a varchar it will do data conversion and may not take advantage of indexes that you have built on the table.

Get the rest of the row in a max group by

I'm trying to acquire the most recently passed training someone has taken. To do this, I have a view that works great
CREATE OR REPLACE FORCE VIEW MYAPP.most_recent_training (
employee_id, course_id, date_taken
) AS SELECT
who.employee_id,
course.course_id,
MAX(sess.end_date) date_taken
FROM employee_session_join esj
JOIN training_session sess on sess.session_id = esj.session_id
JOIN course_version vers on vers.version_id = sess.version_id
JOIN course course on course.course_id = vers.course_id
JOIN employee who on who.employee_id = esj.employee_id
WHERE esj.active_flag = 'Y'
AND sess.active_flag = 'Y'
AND course.active_flag = 'Y'
AND who.active_flag = 'Y'
AND esj.approval_status = 5 -- successfully passed
GROUP BY who.employee_id, course.course_id
Okay, so my query works excellent. Here's my problem - I also need the expiry date so I know when they go out of compliance. This is stored as a number of months on the version. But I can't add vers.valid_for_months because it complains ORA-00979: not a GROUP BY expression.
I just want to get whatever the rest of that row is. How can I do this?

I would think this would solve your problem:
SELECT who.employee_id, course.course_id,
MAX(add_months(sess.end_date, vers.valid_for_months))
That gets the latest end date. If you want the end date for the last session, use row_number():
SELECT employee_id, course_id, end_date
FROM (SELECT who.employee_id, course.course_id, sess.end_date,
row_number() over (partition by who.employee_id, course.course_id
order by sess.end_date
) as seqnum
FROM employee_session_join esj
JOIN training_session sess on sess.session_id = esj.session_id
JOIN course_version vers on vers.version_id = sess.version_id
JOIN course course on course.course_id = vers.course_id
JOIN employee who on who.employee_id = esj.employee_id
WHERE esj.active_flag = 'Y'
AND sess.active_flag = 'Y'
AND course.active_flag = 'Y'
AND who.active_flag = 'Y'
AND esj.approval_status = 5 -- successfully passed
) e
WHERE seqnum = 1;

Show records where team assignment = 1

by changing the below TeamRecords = 1 to = another number finds the rows with the amount I change to, its only sometimes its counting one too many which is odd. When a new Incident is created it has a unique number and every time a new assignment is added from the Task table it adds another row of the IncidentNumber, so you could have duplicate Incident number rows which I've remove with the seq = 1 below. When a new assignment is created it creates a new CreateddateTime in the Task table so for example you could do a Max(t.[CreatedDateTime] to find the last assignment of any IncidentNumber. So, the TeamRecords = 1 is what I need to find all records for that specific team where there is only 1 assignment for that team.
Does that help any?
Here is what I have so far...
Use TEST
Go
WITH RankResult AS
(
SELECT i.[IncidentNumber],
i.[CreatedDateTime],
i.[ResolutionDateAndTime],
i.[Priority],
i.[Status],
i.[ClientName],
i.[ClientSite],
t.[OwnerTeam],
t.[Owner],
row_number() over( partition by i.RecID
order by t.CreatedDateTime desc, t.OwnerTeam ) seq,
TeamRecords = COUNT(*) OVER(PARTITION BY t.ParentLink_RecID)
FROM Incident as i
Inner JOIN Task as t
ON i.RecID = t.ParentLink_RecID
WHERE t.OwnerTeam = 'Infrastructure Services'
AND i.CreatedDateTime >= '20121001'
AND i.CreatedDateTime <= '20131001'
)
SELECT DISTINCT
[IncidentNumber],
[CreatedDateTime],
[ResolutionDateAndTime],
[Priority],
[Status],
[ClientName],
[ClientSite],
[OwnerTeam],
[Owner]
FROM RankResult
Where TeamRecords = 1
And Seq = 1
Order By IncidentNumber Asc
GO

Using ROW_NUMBER means you will return the first assignment per team, not necessarily the teams with only one assignment. To do this you can use COUNT(*) OVER():
WITH RankResult AS
(
SELECT i.[IncidentNumber],
i.[CreatedDateTime],
i.[ResolutionDateAndTime],
i.[Priority],
i.[Status],
i.[ClientName],
i.[ClientSite],
t.[OwnerTeam],
t.[Owner],
TeamRecords = COUNT(*) OVER(PARTITION BY i.RecID)
FROM Incident as i
INNER JOIN Task as t
ON i.RecID = t.ParentLink_RecID
WHERE t.OwnerTeam = 'Info Services'
AND i.CreatedDateTime >= '20121001'
AND i.CreatedDateTime <= '20131001'
)
SELECT DISTINCT
[IncidentNumber],
[CreatedDateTime],
[ResolutionDateAndTime],
[Priority],
[Status],
[ClientName],
[ClientSite],
[OwnerTeam],
[Owner]
FROM RankResult
WHERE TeamRecords = 1;
2 things to note that I have changed in addition to the analytic function. Firstly I have changed your dates to the culture independant format yyyyMMdd, yyyy-MM-dd can still be ambiguous, so 2013-01-02 could be the 1st Feb or the 2nd Jan depending on your server/session settings. Secondly, Your where cluase was turning your join into an INNER JOIN anyway, so I just made it an INNER JOIN:
FROM Incident as i
LEFT JOIN Task as t
ON i.RecID = t.ParentLink_RecID
WHERE t.OwnerTeam = 'Info Services'
AND i.CreatedDateTime >= '20121001'
Here, if there is no match in Task then OwnerTeam will be NULL and NULL = 'Info Services' evaluates to false, so you will never return any rows with no match in Task thus making it an INNER JOIN). If you did in fact want a LEFT JOIN then you need to move this clause to the JOIN:
FROM Incident as i
LEFT JOIN Task as t
ON i.RecID = t.ParentLink_RecID
AND t.OwnerTeam = 'Info Services'
WHERE i.CreatedDateTime >= '20121001'

You can always use a simple subquery which groups across the relevant columns, counts them, filters where the count is 1, then joins it back to the main table to select the appropriate rows:
SELECT Incident.*
FROM (
SELECT OwnerTeam
FROM Incident AS Inc1
GROUP BY OwnerTeam
HAVING COUNT( * ) = 1
) AS Team
, Incident
WHERE Incident.OwnerTeam = Team.OwnerTeam
(Without more information, though, it's difficult to say if this will work for you.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL remove duplicates from results - sql

Related

SQL nested grouping issue

Should a subquery on a join use tables from an outer query in the where clause?

SQL Server / T-SQL : query optimization assistance

Get the rest of the row in a max group by

Show records where team assignment = 1

Categories

Resources