Find incorrect records by Id - sql

I am trying to find records where the personID is associated to the incorrect SoundFile(String). I am trying to search for incorrect records among all personID's, not just one specific one. Here are my example tables:
TASKS-
PersonID SoundFile(String)
123 D10285.18001231234.mp3
123 D10236.18001231234.mp3
123 D10237.18001231234.mp3
123 D10212.18001231234.mp3
123 D12415.18001231234.mp3
**126 D19542.18001231234.mp3
126 D10235.18001234567.mp3
126 D19955.18001234567.mp3
RECORDINGS-
PhoneNumber(Distinct Records)
18001231234
18001234567
So in this example, I am trying to find all records like the one that I indented. The majority of the soundfiles like '%18001231234%' are associated to PersonID 123, but this one record is PersonID 126. I need to find all records where for all distinct numbers from the Recordings table, the PersonID(s) is not the majority.
Let me know if you need more information!
Thanks in advance!!

; WITH distinctRecordings AS (
SELECT DISTINCT PhoneNumber
FROM Recordings
),
PersonCounts as (
SELECT t.PersonID, dr.PhoneNumber, COUNT(*) AS num
FROM
Tasks t
JOIN distinctRecordings dr
ON t.SoundFile LIKE '%' + dr.PhoneNumber + '%'
GROUP BY t.PersonID, dr.PhoneNumber
)
SELECT t.PersonID, t.SoundFile
FROM PersonCounts pc1
JOIN PersonCounts pc2
ON pc2.PhoneNumber = pc1.PhoneNumber
AND pc2.PersonID <> pc1.PersonID
AND pc2.Num < pc1.Num
JOIN Tasks t
ON t.PersonID = pc2.PersonID
AND t.SoundFile LIKE '%' + pc2.PhoneNumber + '%'
SQL Fiddle Here
To summarize what this does... the first CTE, distinctRecordings, is just a distinct list of the Phone Numbers in Recordings.
Next, PersonCounts is a count of phone numbers associated with the records in Tasks for each PersonID.
This is then joined to itself to find any duplicates, and selects whichever duplicate has the smaller count... this is then joined back to Tasks to get the offending soundFile for that person / phone number.
(If your schema had some minor improvements made to it, this query would have been much simpler...)

here you go, receiving all pairs (PersonID, PhoneNumber) where the person has less entries with the given phone number than the person with the maximum entries. note that the query doesn't cater for multiple persons on par within a group.
select agg.pid
, agg.PhoneNumber
from (
select MAX(c) KEEP ( DENSE_RANK FIRST ORDER BY c DESC ) OVER ( PARTITION BY rt.PhoneNumber ) cmax
, rt.PhoneNumber
, rt.PersonID pid
, rt.c
from (
select r.PhoneNumber
, t.PersonID
, count(*) c
from recordings r
inner join tasks t on ( r.PhoneNumber = regexp_replace(t.SoundFile, '^[^.]+\.([^.]+)\.[^.]+$', '\1' ) )
group by r.PhoneNumber
, t.PersonID
) rt
) agg
where agg.c < agg.cmax
;
caveat: the solution is in oracle syntax though the operations should be in the current sql standard (possibly apart from regexp_replace, which might not matter too much since your sound file data seems to follow a fixed-position structure ).

Related

Query keeps giving me duplicate records. How can I fix this?

I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
with clientbridge as (Select *
from (Select visitorid, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY visitorid,student_id,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
--where student_id = '9999999-aaaa-6634-bbbb-96fa18a9046e'
)
where rn = 1 --visitorid = '999999999999999999999999999999'---'1111111111111111111111111111111' --and pai.datekey is not null --- 00000000000000000000000000
),
-----------------Data Header Table
studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
select
*
from clientbridge ab inner join studentvisit sv on sv.visid_visitorid = cb.visitorid
I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
I think you may get have a better shot by joining the two datasets in the same query where you want the data ranked, otherwise your rank from query will be ignored within the results from the second query. Perhaps, something like ->
;with studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
,clientbridge as (
Select
sv.*,
university.course_office_hour_bridge cohd, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY sv.visitorid,sv.student_id,sv,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
inner join studentvisit sv on sv.visid_visitorid = cohd.visitorid
)
select
*
from clientbridge WHERE rn=1

SQL query to join smae table multiple times

I have a scenario to join the same table multiple times to get the desired output. For ex I have two tables TABLE A and TABLE B.
Step 1: I want to take the all the parties from TABLE A which have
lowest Idate. Lowest idate will be fetched based partyid and idate
column.
Step 2: Then based on CID which is fetched from TABLE A in step 1,
we need to fetch the corresponding MID from TABLE B which have
MIDTYPE=130300.
Step 3: Then based on the MID fetched in step 2 we need to traverse
the same table and find out the latest record for the same MID based
on idate in TABLE B and fetch the corresponding CID for the MID.
Step 4: Now for that CID we need to fetch MID value for MIDTYPE
130307 in the same table(TABLEB). And my final output should be combination of MID
which we fetched for step 3 and MID fetched for 130307 in step 4.
I write a query like this ..but its taking lot of time for the query to run as we are going through the same table(TABLEB) multiple times and TABLEB have millions of rows. Is there anyway we can rewrite this query in different way. Could some one can help with this me.
SELECT
ident.mid mid1,
b.mid mid2
FROM
(
SELECT
*
FROM
tableb
WHERE
midtype = 130307
) ident
INNER JOIN (
SELECT
s.cid,
s.mid,
s.midtype
FROM
(
SELECT
cid,
partyid,
admin_sys_tp_cd,
mid,
ilast
FROM
(
SELECT
cq.cid,
RANK() OVER(
PARTITION BY cq.partyid
ORDER BY
cq.idate ASC
) rnk,
cq.idate,
cq.partyid,
i.mid,
i.idate AS ilast
FROM
tablea cq
INNER JOIN tableb i ON cq.cid = i.cid
INNER JOIN tablec ON i.cid = c.cid
WHERE
i.midtype = 130300
)
WHERE
rnk = 1
) a
INNER JOIN (
SELECT
*
FROM
(
SELECT
cid,
mid,
midtype,
RANK() OVER(
PARTITION BY mid
ORDER BY
idate DESC
) rnk_mpid
FROM
tableb
)
WHERE
rnk_mpid = 1
) s ON a.mid = s.mid
AND s.midtype = 130300
) b ON ident.cid = b.cid
AND ident.midtype = 130307
not what you asked, but before others and I, spent time trying to get different approaches for you, let's make sure the basics are covered.
No matter how different you can write an SQL query, they will never perform fast, in a MILLION base table if you don't have the proper indexes for it. Specially in your case, as you have to access it 3 times at least.
Just by looking at your detailed steps. I would say that you should have at least 3 different indexes created to support this query.
TableA_Index1 ( PARTYID, LDATE, INCLUDES CID)
TableB_Index1 (CID, MIDTYPE, INCLUDES MID )
TableB_Index2 (MID, LDATE, INCLUDES CID )
Do you have them ?
Have you ever tried to run this query on db2-advisor (db2advis) to get recommended indexes for it ?

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

SQL Server CTE use IDs from single column with EXCEPT?

Having received kindness the other day from someone whose eyes were less bleary than mine I thought I'd give it another shot. Thanks in advance for your assistance.
I have a single SQL Server (2012) table named Contacts. That table has four columns I am currently concerned with. The table has a total of 71,454 rows. There are two types of records in the table; Companies and Employees. Both use the same column, named (Client ID), for their primary key. The existence of a Company Name is what differentiates between Company and Employee data. Employees have no associated Company Name. There are 29,021 Companies leaving 42,433 Employees.
There may be 0-n number of Employees associated with any one Company. I am attempting to create output that will reflect the relationship between Companies and Clients, if there are any. I would like to use the Company ID (Client ID column) as my anchor data set.
Not sure my definition is correct but the thought was to create a CTE of the known Companies by virtue of a given Company Name. Then, use the remaining Client IDs but use the EXCEPT clause to filter the already-retrieved Client IDs out of the result set.
Here the code I currently have;
;
WITH cte ( BaseID, Client_id, Company_name,
First_name, Last_name, [level] )
AS ( SELECT Client_id AS BaseID ,
Client_id ,
Company_name ,
First_name ,
Last_name ,
1
FROM dbo.Conv_client_clean
WHERE ( COMPANY_NAME IS NOT NULL
OR COMPANY_NAME != ''
)
UNION ALL
SELECT c.BaseID ,
children.Client_id ,
children.Company_name ,
children.First_name ,
children.Last_name ,
cte.[level] + 1
FROM dbo.Conv_client_clean children
INNER JOIN cte c ON c.Client_id = children.CLIENT_ID
EXCEPT
SELECT children.Client_id
FROM cte
)
SELECT BaseID ,
Client_id ,
Company_name ,
first_name ,
Last_name ,
[Level]
FROM cte
OPTION ( MAXRECURSION 0 );
In this instance I receive the following error;
Msg 252, Level 16, State 1, Line 3
Recursive common table expression 'cte' does not contain a top-level UNION ALL operator.
Any suggestions?
Thanks!
In the recursion cte query, you cannot have more set operations(union, except, union all,intersect) after the the one Union ALL which is refers the cte itself. I think what you can try is change the query as below and check
...
UNION ALL
SELECT c.BaseID ,
children.Client_id ,
children.Company_name ,
children.First_name ,
children.Last_name ,
cte.[level] + 1
FROM dbo.Conv_client_clean children
WHERE children.Client_id NOT IN (SELECT Client_id FROM cte)
As mentioned to Kiran I was able to concoct an 'old fashioned' approach what is good enough for now.
Thank you everyone for your kind attention.
I'm not sure what you are trying to do with level. It seems that it will be 1 for companies and 2 for employees. If that's the case, you don't even need recursion. The first part of your cte creates a list of companies. That's fine. Now use that to join back to the original table to show all the employees too.
WITH
cte( BaseID, ClientID, Company_name, First_name, Last_name )AS(
SELECT Base_ID,
Base_ID AS Client_id ,
Company_name,
First_name,
Last_name
FROM dbo.Conv_client_clean
WHERE COMPANY_NAME IS NOT NULL
OR COMPANY_NAME <> ''
)
select c2.Base_id, c2.Client_id,
c1.Company_Name, c2.First_Name, c2.Last_Name,
case when c2.client_id is null then 1 else 2 end Level
from cte c1
join Conv_client_clean c2
on c1.BaseID = isnull( c2.Client_ID, c2.Base_id )
order by c1.BaseID, c2.Base_id;
Here's where I fiddled with it.
Unfortunately anything besides UNION ALL, after you've made your recursive reference, will not work. And if you think about it, it makes sense.
Recursion is conceptually identical to the following where recursion continues until max depth is reached or a query returns no results upon which another execution could act.
WITH Anchor AS (select...)
,recurse1 as (<Some body referring to Anchor>)
,recurse2 as (<Identical body except referring to recurse1>)
,recurse3 as (<Identical body except referring to recurse2>)
...
select * from Anchor
union all
select * from recurse1
union all
select * from recurse2
...
The problem is that conjunctive operators apply to EVERYTHING that precedes it. In your case, EXCEPT operates on everything to it's left side which includes the Anchor query. Afterwards, when looking for the anchor to which the recursive part must be applied, the query compiler doesn't find a 'top level union all operator' any more because it's been consumed as part of the left side of your recursive query.
It wouldn't help to contrive some syntax akin to parenthesis that could delimit the scope of the left side of your table conjunction because you would then build a case of 'multiple recursive references' which is also illegal.
BOTTOM LINE IS: The only conjunction that works in the recursive part of your query is UNION ALL because it simply concatenates the right side. It doesn't require knowledge of the left side to determine which rows to include.

Select exactly one row for each employee using unordered field as criteria

I have a data set that looks like the following.
EMPLID PHONE_TYPE PHONE
------ ---------- --------
100 HOME 111-1111
100 WORK 222-2222
101 HOME 333-3333
102 WORK 444-4444
103 OTHER 555-5555
I want to select exactly one row for each employee using the PHONE_TYPE field to establish preferences. I want the HOME phone number if the employee has one as is the case for employee 100 and 101. If the HOME number is not present, I want the WORK number (employee 102), and as a last resort I'll take the OTHER number as with employee 103. In reality my table has about a dozen values for the PHONE_TYPE field, so I need to be able to extend any solution to include more than just the three values I've shown in the example. Any thoughts? Thanks.
You need to add a phone_types table (Phone_Type TEXT(Whatever), Priority INTEGER). In this table, list each Phone_Type value once and assign a priority to it (in your example, HOME would be 1, WORK 2, OTHER 3 and so on).
Then, create a view that joins the Priority column from Phone_Types to your Phone_Numbers table (imagine we call it Phone_Numbers_Ex).
Now, you have several options for how to get record from Phone_Numbers_Ex with the MIN(Priority) for a given emplID, of which probably the clearest is:
SELECT * FROM Phone_Numbers_Ex P1 WHERE NOT EXISTS
(SELECT * FROM Phone_Numbers_Ex P2 WHERE P2.EmplID = P1.EmplID AND P2.Priority < P1.Priority)
Another way is to declare another view, or inner query, along the lines of SELECT EmplID, MIN(Priority) AS Priority FROM Phone_Numbers_Ex GROUP BY EmplID and then joining this back Phone_Numbers_Ex on both EmplID and Priority.
I forget, does Server 2000 support Coalesce? If it does, I think this will work:
Select Distinct EmplID, Coalesce(
(Select Phone from Employees where emplid = e1.emplid and phone_type = 'HOME'),
(Select Phone from Employees where emplid = e1.emplid and phone_type = 'WORK'),
(Select Phone from Employees where emplid = e1.emplid and phone_type = 'OTHER')
) as Phone
From Employees e1
Your requirements may not be complete if an employee is allowed to have more than one phone number for a given phone type. I've added a phone_number_id just to make things unique and assumed that you would want the lowest id if the person has two phones of the same type. That's pretty arbitrary, but you can replace it with your own business logic.
I've also assumed some kind of a Phone_Types table that includes your priority for which phone number should be used. If you don't already have this table, you should probably add it. If nothing else, it lets you constrain the phone types with a foreign key.
SELECT
PN1.employee_id,
PN1.phone_type,
PN1.phone_number
FROM
Phone_Numbers PN1
INNER JOIN Phone_Types PT1 ON
PT1.phone_type = PN1.phone_type
WHERE
NOT EXISTS
(
SELECT *
FROM
Phone_Numbers PN2
INNER JOIN Phone_Types PT2 ON
PT2.phone_type = PN2.phone_type AND
(
(PT2.priority < PT1.priority)
--OR (PT2.priority = PT1.priority AND PN2.phone_number_id > PN1.phone_number_id)
)
)
You could also implement this with a LEFT JOIN instead of the NOT EXISTS or you could use TOP if you were looking for the phone number for a single employee. Just do a TOP 1 ORDER BY priority, phone_number_id.
Finally, if you were to move up to SQL 2005 or SQL 2008, you could use a CTE with ROWNUMBER() OVER (ORDER BY priority, phone_number, PARTITION BY employee_id) <- I think my syntax may be slightly off with the parentheses on that, but hopefully it's clear enough. That would allow you to get the top one for all employees by checking that ROWNUMBER() = 1.
As an alternative g.d.d.c's answer that uses queries in the Select clause you could use left joins. You might get better perf, but you should test of course.
SELECT
e1.iD,
Coalesce(phoneHome.Phone,phoneWork.Phone,phoneOther) phone
FROm
employees e1
LEFT JOIN phone phoneHome
ON e1.emplId = phoneHome
and phone_type = 'HOME'
LEFT JOIN phone phoneWork
ON e1.emplId = phoneWork
and phone_type = 'WORK'
LEFT JOIN phone phoneWork
ON e1.emplId = phoneOTHER
and phone_type = 'OTHER'