SQL query help with non-unique duplicates

SQL query help with non-unique duplicates - sql

I can't think through this one. I have this query:
SELECT
p.person_id,
p.first_nm,
p.last_nm,
pu.purchase_dt,
pr.sku,
pr.description,
a.address_type_id,
a.city_cd,
a.state_cd,
a.postal_cd
FROM
person p
INNER JOIN address a ON p.person_id = a.person_id
INNER JOIN purchase pu ON pu.person_id = p.person_id
INNER JOIN product pr ON pr.product_id = pu.product_id
Simple enough - I just need to get the information for customers that we've shipped returns to. However, because of the addressType table
AddressType
address_type_id address_type_desc
------------------------------------
1 Home
2 Shipping
some customers have multiple addresses in the address table, creating non-unique duplicate entries like this.
1,Smith, John, 12/01/2009, A12345, Purple Widget, 1, Anywhere, CA, 12345
1,Smith, John, 12/01/2009, A12345, Purple Widget, 2, Somewhere, ID, 54321
I'd like to get the query to return just one row/person and return the home address if available otherwise, return the shipping address.
This seems simple enough, and maybe it's just my cold, but this is causing me to scratch my head somewhat.

you want to change your join so it returns the min(addressID) instead of all of them:
INNER JOIN address a ON p.person_id = a.person_id
inner join (select person_id, min(address_type_id) as min_addr
from address group by person_id) a_min
on a.person_id = a_min.person_id and a.address_type_id = a_min.min_addr

SELECT
p.person_id,
p.first_nm,
p.last_nm,
pu.purchase_dt,
pr.sku,
pr.description,
COALESCE(ha.address_type_id, sa.address_type_id) AS address_type_id
CASE WHEN ha.address_type_id IS NOT NULL THEN ha.city_cd ELSE sa.city_cd END AS city_cd,
CASE WHEN ha.address_type_id IS NOT NULL THEN ha.state_cd ELSE sa.state_cd END AS state_cd,
CASE WHEN ha.address_type_id IS NOT NULL THEN ha.postal_cd ELSE sa.postal_cd END AS postal_cd
FROM
person p
LEFT JOIN address ha ON p.person_id = ha.person_id AND ha.address_type_id = 1
LEFT JOIN address sa ON p.person_id = sa.person_id AND sa.address_type_id = 2
INNER JOIN purchase pu ON pu.person_id = p.person_id
INNER JOIN product pr ON pr.product_id = pu.product_id

If SQL Server, or other version with common table expressions (CTE), you could do the following. The CTE adds a row-number column that is grouped by person and ordered by the address_type_id. The main query is altered to return number 1 row for each person from the CTE.
WITH cte AS
(
SELECT
a.person_id,
a.address_type_id,
a.city_cd,
a.state_cd,
a.postal_cd,
ROW_NUMBER() over (PARTITION BY person_id ORDER BY address_type_id) AS sequence
FROM address a
INNER JOIN AddressType at ON a.address_type_id = at.address_type_id
)
SELECT
p.person_id,
p.first_nm,
p.last_nm,
pu.purchase_dt,
pr.sku,
pr.description,
a.address_type_id,
a.city_cd,
a.state_cd,
a.postal_cd
FROM
person p
INNER JOIN cte a ON p.person_id = a.person_id
INNER JOIN purchase pu ON pu.person_id = p.person_id
INNER JOIN product pr ON pr.product_id = pu.product_id
WHERE
a.sequence = 1
By the way, if you have person records that have no addresses, you might want to change the INNER JOIN to an OUTER JOIN on the addresses table (cte in my answer). This may also be appropriate for joins to purchase and product if your requirements indicate so.

Related

Joining tables and finding values that do not exist

I am having some issues with joining tables to get null values, and I can't find what I am doing wrong.
The case: I am trying to make a cinema system, where I have made entities that match the cinema.
I have a Hall, Row and Seat table, and a Show table that holds the value for movies and what hall it will be played in. To bond everything together, I have made a Reservation table that is keeping track of what seats to that specific show is taken.
My entities look like this:
My problem: I am trying to fetch all free seats for the show, I can get all seats for the show, but when I try to add the Reservation to get the free ones I get no records.
My query that is able to fetch all seats:
SELECT show.id AS ShowID,
seat.id AS SeatID,
seat.rowid AS RowID,
show.hallid AS HallId,
reservation.seatid AS Expr1
FROM show
INNER JOIN hall
ON show.hallid = hall.id
FULL OUTER JOIN seat
ON hall.id = seat.hallid
LEFT OUTER JOIN reservation
ON reservation.showid = show.id
WHERE ( show.id = 1 )
AND ( reservation.seatid IS NULL )
ORDER BY reservation.showid,
rowid

You need INNER joins between Show, Hall, Row and Seat and a LEFT join to Reservation, so you can filter out the matched rows:
SELECT s.Id AS ShowID, t.Id AS SeatID, t.RowId AS RowID, s.HallId
FROM Show s
INNER JOIN Hall h ON h.Id = s.HallId
INNER JOIN Seat t ON t.HallId = h.Id
INNER JOIN Row w ON w.HallId = h.Id AND w.Id = t.RowId
LEFT JOIN Reservation r ON r.ShowId = s.Id AND r.HallId = h.Id AND r.SeatId = t.Id AND r.RowId = w.Id
WHERE (s.Id = 1) AND (r.SeatId IS NULL)

Replace:
INNER JOIN Hall ON Show.Id = Hall.Id FULL OUTER JOIN
With:
INNER JOIN Hall ON Show.HallId = Hall.Id FULL OUTER JOIN
While it might not be the full answer to your question, i think this might cause issues for you too.

Add a new column to SELECT if elements of an INNER JOIN exists

I have 3 table PERSONS, COMPANIES and PERSON_CUSTOMER_COMPANY which makes the relation n-to-n if the person is a customer of the company (a person may be others relations with each company).
This query returns all companies that have a relation with a given person as a Customer (line 3 inner join).
select co.name from COMPANIES co
INNER JOIN PERSONS p on p.COMPANY_ID = co.id
INNER JOIN PERSON_CUSTOMER_COMPANY pcc on pcc.PERSON_ID = p.PERSON_ID
WHERE p.PERSON_ID = 123456;
I need to change this query to return all companies from a person even if they are not related in the PERSON_CUSTOMER_COMPANY and an extra field indicating if the person is a customer of the company.
Something like "isCustomer"
select co.name, isCustomer from COMPANIES co ...

An inner join will only return results that match in both tables. Since you are looking for potential companies that don't have records in the person_customer_company table, then you need an outer join instead. Then you can use a case statement to create the new column:
SELECT co.name,
CASE WHEN pcc.Person_id IS NULL then 'No' else 'Yes' End as IsCustomer
FROM COMPANIES co
INNER JOIN PERSONS p on p.COMPANY_ID = co.id
LEFT JOIN PERSON_CUSTOMER_COMPANY pcc on pcc.PERSON_ID = p.PERSON_ID
WHERE p.PERSON_ID = 123456;

I would probably use exists:
SELECT co.name,
(CASE WHEN EXISTS (SELECT 1
FROM PERSON p
WHERE p.COMPANY_ID = co.id AND
p.PERSON_ID = 123456
)
THEN 'Yes' ELSE 'No'
END) as IsCustomer
FROM COMPANIES co;
This only uses PERSON, because that is the JOIN you use in your query.
I suspect you really want:
SELECT co.name,
(CASE WHEN EXISTS (SELECT 1
FROM PERSON_CUSTOMER_COMPANY pcc
WHERE pcc.COMPANY_ID = co.id AND
pcc.PERSON_ID = 123456
)
THEN 'Yes' ELSE 'No'
END) as IsCustomer
FROM COMPANIES co;
In either case, only two table are necessary.

SQL LEFT JOIN for joining three tables but one with to exclude content

I have 3 tables
STUDENTS
FEES_PAID
SUSPENDED
I want to get the details of the students who have paid the fees but not from SUSPENDED.
SELECT
ID
FROM
STUDENTS s
LEFT JOIN
SUSPENDED p ON s.ID = p.ID
INNER JOIN
FEES_PAID f ON f.ID = s.ID
WHERE
s.ID IS NULL
Unfortunately this does not work. Can any one suggest an efficient query?
Thanks in advance

You need to check if the second table is missing from the LEFT JOIN. So, you need to look at a column in that table. Change the WHERE to:
WHERE p.ID IS NULL
Alternatively, use NOT EXISTS:
SELECT s.ID
FROM STUDENTS s INNER JOIN
FEES_PAID f
ON f.ID = s.ID
WHERE NOT EXISTS (SELECT 1 FROM SUSPENDED p WHERE s.ID = p.ID);
Note that for both these queries, you will need to qualify the ID in the SELECT to specify the table where it comes from.

This should work:
SELECT
s.ID
FROM
STUDENTS s
LEFT JOIN
SUSPENDED p
ON s.ID=p.ID
INNER JOIN
FEES_PAID f
ON f.ID= s.ID
WHERE
p.ID IS NULL

Selecting single column multiple times based on different conditions

I have written a SQL query to retrieve required data and it looks like given below:
SELECT distinct p.person_id,p.birth_date,p.gender_code,
wm_concat(distinct r.race_code) as race_code,p.hispanic_latino_code,
c.clinically_diagnosed_code,
wm_concat(distinct c.characteristic_code) as chara_codes,
p.prev_adopted_code,p.age_adopted,
FIRST_VALUE(pe.removed_date) OVER (ORDER BY pe.removed_date),
count(pe.removed_date) as removal_count,
LAST_VALUE(pe.discharge_date) OVER (ORDER BY pe.discharge_date),
LAST_VALUE(pe.removed_date) OVER (ORDER BY pe.removed_date) as latest_removal_date,pe.created_date,
pe.removal_circumstance_code,wm_concat(distinct rr.removal_reason_code) as removal_reasons,
ps.placement_type_code,ps.icpc_placement_flag,pe.caretaker_structure_code
FROM PERSON p left outer join RACE r on p.person_id = r.person_id
left outer join CHARACTERISTIC c on c.person_id = p.person_id
left outer join PLACEMENT_EPISODE pe on p.person_id = pe.child_id
left outer join PLACEMENT_SETTING ps on p.person_id = ps.child_id
left outer join REMOVAL_REASON rr on pe.placement_episode_id = rr.placement_episode_id
GROUP BY p.person_id,p.birth_date,p.gender_code,p.hispanic_latino_code,
c.clinically_diagnosed_code,p.prev_adopted_code,p.age_adopted,pe.removed_date,
pe.discharge_date,pe.removed_date,pe.created_date,pe.removal_circumstance_code,
ps.placement_type_code,ps.icpc_placement_flag,pe.caretaker_structure_code
ORDER BY p.person_id
In the above mentioned query, I have already selected birth date for a person. Now again in select clause I want to select birth_date for persons with following condition:
condition 1: p.person_id = pe.primary_caretaker_id
condition 2: p.person_id = pe.secondary_caretaker_id
Can someone tell me the way to select these fields(birth_date based on two different conditions) in the existing query?
Birth_date has been already selected once for individual person. Now I want to retrieve birth_date for primary_caretaker and secondary_caretaker.

You will need to join to the PERSON table twice more:
SELECT distinct p.person_id,p.birth_date,p.gender_code,
wm_concat(distinct r.race_code) as race_code,p.hispanic_latino_code,
c.clinically_diagnosed_code,
wm_concat(distinct c.characteristic_code) as chara_codes,
p.prev_adopted_code,p.age_adopted,
FIRST_VALUE(pe.removed_date) OVER (ORDER BY pe.removed_date),
count(pe.removed_date) as removal_count,
LAST_VALUE(pe.discharge_date) OVER (ORDER BY pe.discharge_date),
LAST_VALUE(pe.removed_date) OVER (ORDER BY pe.removed_date) as latest_removal_date,
pe.created_date,
pe.removal_circumstance_code,wm_concat(distinct rr.removal_reason_code) as removal_reasons,
ps.placement_type_code,ps.icpc_placement_flag,pe.caretaker_structure_code,
primCare.birth_date as primary_carer_birth_date,
secCare.birth_date as secondary_carer_birth_date,
FROM PERSON p left outer join RACE r on p.person_id = r.person_id
left outer join PERSON primCare on primCare.person_id = pe.primary_caretaker_id
left outer join PERSON secCare on secCare.person_id = pe.secondary_caretaker_id
left outer join CHARACTERISTIC c on c.person_id = p.person_id
left outer join PLACEMENT_EPISODE pe on p.person_id = pe.child_id
left outer join PLACEMENT_SETTING ps on p.person_id = ps.child_id
left outer join REMOVAL_REASON rr on pe.placement_episode_id = rr.placement_episode_id
GROUP BY p.person_id,p.birth_date,p.gender_code,p.hispanic_latino_code,
c.clinically_diagnosed_code,p.prev_adopted_code,p.age_adopted,pe.removed_date,
pe.discharge_date,pe.removed_date,pe.created_date,pe.removal_circumstance_code,
ps.placement_type_code,ps.icpc_placement_flag,pe.caretaker_structure_code, primCare.birth_date, secCare.birth_date
ORDER BY p.person_id

Issues with joining many tables getting to many values

This is my script:
select c.rendering_id as prov_number, c.begin_date_of_service as date_of_service,
c.practice_id as group_number, v.enc_nbr as invoice, p.person_nbr as patient,
v.enc_nbr as invoice_number, c.charge_id as transaction_number,
t.med_rec_nbr as primary_mrn, p.last_name, p.first_name,
z.payer_id as orig_fsc_number, z.payer_id as curr_fsc_number,
c.location_id as location_number, c.closing_date as posting_date,
c.quantity as service_units, c.amt as charge_amount,
c.cpt4_code_id as procedure_code, r.description as procedure_name,
x.tran_code_id as pay_code_number, ISNULL([modifier_1],'') as modifier_code_1,
ISNULL([modifier_2],'') as modifier_code_2, ISNULL([modifier_3],'') as modifier_code_3,
ISNULL ([icd9cm_code_id],'') as dx_code_1, ISNULL ([icd9cm_code_id_2],'') as dx_code_2,
ISNULL ([icd9cm_code_id_3],'') as dx_code_3, ISNULL ([icd9cm_code_id_4],'') as dx_code_4
from charges c, person p, patient t, patient_encounter v, encounter_payer z, cpt4_code_mstr r, transactions x
where c.person_id = p.person_id
and c.person_id = t.person_id
and c.person_id = v.person_id
and c.person_id = z.person_id
and c.cpt4_code_id = r.cpt4_code_id
and c.person_id = x.person_id
and c.practice_id = '0001'
and c.closing_date >= GetDate() - 7
I should be getting about 14k rows but with this I am getting a couple hundred thousand. I feel like there should be an inner join here to correct it but I have read through a bunch of posts and can seem to get it working. Its by far the biggest pull I have ever done in SQL.
Any help would be greatly help.

Without knowing more about the data structures and foreign key relationships, this answer is just educated speculation. Before answering, though, you need to learn proper JOIN syntax. Your query should look like:
from charges c join
person p
on . . . .
That said, you problem is probably that you are joining along multiple dimensions at the same time. Although not explicitly clear, I am guessing that a person could have multiple patient encounters, say A, B, and C. A person might also have multiple charges, say 10, 11, and 12.
Your query will produce nine rows in this case, one for each combination.
In other words, you need to identify:
Verify the join keys between tables. Is a table called transactions really joined to encounters and costs using the person_id?
Find out where you are getting cross products, and split into two subqueries that are then appropriately joined together.
I would suggest that you start with the first two tables, and see whether you get the expected row count for:
select *
from charges c join
person p
on c.person_id = p.person_id
where c.practice_id = '0001' and
c.closing_date >= GetDate() - 7
Then build up the query one table at a time to get the results you want.
One last note, when using table aliases, I find it much clearer to use aliases that evoke the table. "C" for charges is very good. Consider something like "pe" for patient_encounters, and so on.

It should be like this or you can use left join
select c.rendering_id as prov_number, c.begin_date_of_service as date_of_service,
c.practice_id as group_number, v.enc_nbr as invoice, p.person_nbr as patient,
v.enc_nbr as invoice_number, c.charge_id as transaction_number,
t.med_rec_nbr as primary_mrn, p.last_name, p.first_name,
z.payer_id as orig_fsc_number, z.payer_id as curr_fsc_number,
c.location_id as location_number, c.closing_date as posting_date,
c.quantity as service_units, c.amt as charge_amount,
c.cpt4_code_id as procedure_code, r.description as procedure_name,
x.tran_code_id as pay_code_number, ISNULL([modifier_1],'') as modifier_code_1,
ISNULL([modifier_2],'') as modifier_code_2, ISNULL([modifier_3],'') as modifier_code_3,
ISNULL ([icd9cm_code_id],'') as dx_code_1, ISNULL ([icd9cm_code_id_2],'') as dx_code_2,
ISNULL ([icd9cm_code_id_3],'') as dx_code_3, ISNULL ([icd9cm_code_id_4],'') as dx_code_4
from charges c
inner join person p on c.person_id = p.person_id
inner join patient t on c.person_id = t.person_id
inner join patient_encounter v on c.person_id = v.person_id
inner join encounter_payer z on c.person_id = z.person_id
inner join cpt4_code_mstr r on c.cpt4_code_id = r.cpt4_code_id
inner join transactions x on c.person_id = x.person_id
where c.practice_id = '0001'
and c.closing_date >= GetDate() - 7

Now you comment one inner join at a time and execute below query and see which of these joins is causing one to many relationship...when the count gives you say around 14 K that means the commented table is causing 1 to many relationship.
Otherwise best way is to find the relationship based on unique key,primary key and FK on these tables.
select
count(c.person_id)
from charges c
inner join person p on c.person_id = p.person_id
inner join patient t on c.person_id = t.person_id
inner join patient_encounter v on c.person_id = v.person_id
inner join encounter_payer z on c.person_id = z.person_id
inner join cpt4_code_mstr r on c.cpt4_code_id = r.cpt4_code_id
inner join transactions x on c.person_id = x.person_id
where c.practice_id = '0001'
and c.closing_date >= GetDate() - 7
You can try
select count(*) from <tablename> group by person_id having count(*) > 1
and repeat above query for all tables this will give you an idea on what kind of relationship between charges table and other tables. Offcourse use cpt4_code_id for cpt4_code_mstr table but by name it looks like that this table is master table so it will have a signle vale for each cpt4-code_id value in charges table.
I hope it will help

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL query help with non-unique duplicates - sql

Related

Joining tables and finding values that do not exist

Add a new column to SELECT if elements of an INNER JOIN exists

SQL LEFT JOIN for joining three tables but one with to exclude content

Selecting single column multiple times based on different conditions

Issues with joining many tables getting to many values

Categories

Resources