Issues with joining many tables getting to many values - sql

This is my script:
select c.rendering_id as prov_number, c.begin_date_of_service as date_of_service,
c.practice_id as group_number, v.enc_nbr as invoice, p.person_nbr as patient,
v.enc_nbr as invoice_number, c.charge_id as transaction_number,
t.med_rec_nbr as primary_mrn, p.last_name, p.first_name,
z.payer_id as orig_fsc_number, z.payer_id as curr_fsc_number,
c.location_id as location_number, c.closing_date as posting_date,
c.quantity as service_units, c.amt as charge_amount,
c.cpt4_code_id as procedure_code, r.description as procedure_name,
x.tran_code_id as pay_code_number, ISNULL([modifier_1],'') as modifier_code_1,
ISNULL([modifier_2],'') as modifier_code_2, ISNULL([modifier_3],'') as modifier_code_3,
ISNULL ([icd9cm_code_id],'') as dx_code_1, ISNULL ([icd9cm_code_id_2],'') as dx_code_2,
ISNULL ([icd9cm_code_id_3],'') as dx_code_3, ISNULL ([icd9cm_code_id_4],'') as dx_code_4
from charges c, person p, patient t, patient_encounter v, encounter_payer z, cpt4_code_mstr r, transactions x
where c.person_id = p.person_id
and c.person_id = t.person_id
and c.person_id = v.person_id
and c.person_id = z.person_id
and c.cpt4_code_id = r.cpt4_code_id
and c.person_id = x.person_id
and c.practice_id = '0001'
and c.closing_date >= GetDate() - 7
I should be getting about 14k rows but with this I am getting a couple hundred thousand. I feel like there should be an inner join here to correct it but I have read through a bunch of posts and can seem to get it working. Its by far the biggest pull I have ever done in SQL.
Any help would be greatly help.

Without knowing more about the data structures and foreign key relationships, this answer is just educated speculation. Before answering, though, you need to learn proper JOIN syntax. Your query should look like:
from charges c join
person p
on . . . .
That said, you problem is probably that you are joining along multiple dimensions at the same time. Although not explicitly clear, I am guessing that a person could have multiple patient encounters, say A, B, and C. A person might also have multiple charges, say 10, 11, and 12.
Your query will produce nine rows in this case, one for each combination.
In other words, you need to identify:
Verify the join keys between tables. Is a table called transactions really joined to encounters and costs using the person_id?
Find out where you are getting cross products, and split into two subqueries that are then appropriately joined together.
I would suggest that you start with the first two tables, and see whether you get the expected row count for:
select *
from charges c join
person p
on c.person_id = p.person_id
where c.practice_id = '0001' and
c.closing_date >= GetDate() - 7
Then build up the query one table at a time to get the results you want.
One last note, when using table aliases, I find it much clearer to use aliases that evoke the table. "C" for charges is very good. Consider something like "pe" for patient_encounters, and so on.

It should be like this or you can use left join
select c.rendering_id as prov_number, c.begin_date_of_service as date_of_service,
c.practice_id as group_number, v.enc_nbr as invoice, p.person_nbr as patient,
v.enc_nbr as invoice_number, c.charge_id as transaction_number,
t.med_rec_nbr as primary_mrn, p.last_name, p.first_name,
z.payer_id as orig_fsc_number, z.payer_id as curr_fsc_number,
c.location_id as location_number, c.closing_date as posting_date,
c.quantity as service_units, c.amt as charge_amount,
c.cpt4_code_id as procedure_code, r.description as procedure_name,
x.tran_code_id as pay_code_number, ISNULL([modifier_1],'') as modifier_code_1,
ISNULL([modifier_2],'') as modifier_code_2, ISNULL([modifier_3],'') as modifier_code_3,
ISNULL ([icd9cm_code_id],'') as dx_code_1, ISNULL ([icd9cm_code_id_2],'') as dx_code_2,
ISNULL ([icd9cm_code_id_3],'') as dx_code_3, ISNULL ([icd9cm_code_id_4],'') as dx_code_4
from charges c
inner join person p on c.person_id = p.person_id
inner join patient t on c.person_id = t.person_id
inner join patient_encounter v on c.person_id = v.person_id
inner join encounter_payer z on c.person_id = z.person_id
inner join cpt4_code_mstr r on c.cpt4_code_id = r.cpt4_code_id
inner join transactions x on c.person_id = x.person_id
where c.practice_id = '0001'
and c.closing_date >= GetDate() - 7

Now you comment one inner join at a time and execute below query and see which of these joins is causing one to many relationship...when the count gives you say around 14 K that means the commented table is causing 1 to many relationship.
Otherwise best way is to find the relationship based on unique key,primary key and FK on these tables.
select
count(c.person_id)
from charges c
inner join person p on c.person_id = p.person_id
inner join patient t on c.person_id = t.person_id
inner join patient_encounter v on c.person_id = v.person_id
inner join encounter_payer z on c.person_id = z.person_id
inner join cpt4_code_mstr r on c.cpt4_code_id = r.cpt4_code_id
inner join transactions x on c.person_id = x.person_id
where c.practice_id = '0001'
and c.closing_date >= GetDate() - 7
You can try
select count(*) from <tablename> group by person_id having count(*) > 1
and repeat above query for all tables this will give you an idea on what kind of relationship between charges table and other tables. Offcourse use cpt4_code_id for cpt4_code_mstr table but by name it looks like that this table is master table so it will have a signle vale for each cpt4-code_id value in charges table.
I hope it will help

Related

Joining tables and finding values that do not exist

I am having some issues with joining tables to get null values, and I can't find what I am doing wrong.
The case: I am trying to make a cinema system, where I have made entities that match the cinema.
I have a Hall, Row and Seat table, and a Show table that holds the value for movies and what hall it will be played in. To bond everything together, I have made a Reservation table that is keeping track of what seats to that specific show is taken.
My entities look like this:
My problem: I am trying to fetch all free seats for the show, I can get all seats for the show, but when I try to add the Reservation to get the free ones I get no records.
My query that is able to fetch all seats:
SELECT show.id AS ShowID,
seat.id AS SeatID,
seat.rowid AS RowID,
show.hallid AS HallId,
reservation.seatid AS Expr1
FROM show
INNER JOIN hall
ON show.hallid = hall.id
FULL OUTER JOIN seat
ON hall.id = seat.hallid
LEFT OUTER JOIN reservation
ON reservation.showid = show.id
WHERE ( show.id = 1 )
AND ( reservation.seatid IS NULL )
ORDER BY reservation.showid,
rowid
You need INNER joins between Show, Hall, Row and Seat and a LEFT join to Reservation, so you can filter out the matched rows:
SELECT s.Id AS ShowID, t.Id AS SeatID, t.RowId AS RowID, s.HallId
FROM Show s
INNER JOIN Hall h ON h.Id = s.HallId
INNER JOIN Seat t ON t.HallId = h.Id
INNER JOIN Row w ON w.HallId = h.Id AND w.Id = t.RowId
LEFT JOIN Reservation r ON r.ShowId = s.Id AND r.HallId = h.Id AND r.SeatId = t.Id AND r.RowId = w.Id
WHERE (s.Id = 1) AND (r.SeatId IS NULL)
Replace:
INNER JOIN Hall ON Show.Id = Hall.Id FULL OUTER JOIN
With:
INNER JOIN Hall ON Show.HallId = Hall.Id FULL OUTER JOIN
While it might not be the full answer to your question, i think this might cause issues for you too.

Oracle SQL GROUP BY clause containing joins

I'm having issues writing this query correctly. Below is the goal, my current query, and attached are the scripts to build and populate the database. Thanks for any assistance!
For each DVD in the catalog, display it’s title, length, release_date, and how many times it has been checked out by all customers across all libraries. Include those that have not been checked out yet (display as 0). Sort results by title.
SELECT C.TITLE, D.LENGTH, C.RELEASE_DATE, COUNT(T.TRANSACTION_ID)
FROM catalog_item C
INNER JOIN dvd D ON D.CATALOG_ITEM_ID = C.CATALOG_ITEM_ID
INNER JOIN physical_item P ON P.CATALOG_ITEM_ID = C.CATALOG_ITEM_ID
LEFT OUTER JOIN transaction T ON T.PHYSICAL_ITEM_ID = P.PHYSICAL_ITEM_ID
GROUP BY C.TITLE;
Run first: https://drive.google.com/open?id=1PYAZV4KIfZtxP4eQn35zsczySsxDM7ls
Run second: https://drive.google.com/open?id=1pAzWmJqvD3o3n6YJqVUM6TtxDafKGd3f
EDIT
I've gotten the query working but haven't figured out how to get DVDs with zero checkouts to show up. Below is my updated query.
SELECT C.TITLE, D.LENGTH, C.RELEASE_DATE, COUNT(T.TRANSACTION_ID)
FROM catalog_item C
INNER JOIN dvd D ON D.CATALOG_ITEM_ID = C.CATALOG_ITEM_ID
INNER JOIN physical_item P ON P.CATALOG_ITEM_ID = C.CATALOG_ITEM_ID
LEFT OUTER JOIN transaction T ON T.PHYSICAL_ITEM_ID = P.PHYSICAL_ITEM_ID
GROUP BY C.TITLE, D.LENGTH, C.RELEASE_DATE;
SOLVED
Figured out the issue with GMB's help. Final query below!
SELECT C.TITLE, D.LENGTH, C.RELEASE_DATE, NVL(COUNT(T.TRANSACTION_ID), 0) AS NUMBER_OF_CHECKOUTS
FROM catalog_item C
INNER JOIN dvd D ON D.CATALOG_ITEM_ID = C.CATALOG_ITEM_ID
LEFT JOIN physical_item P ON P.CATALOG_ITEM_ID = C.CATALOG_ITEM_ID
LEFT JOIN transaction T ON T.PHYSICAL_ITEM_ID = P.PHYSICAL_ITEM_ID
GROUP BY C.TITLE, D.LENGTH, C.RELEASE_DATE
ORDER BY C.TITLE;
Your second query sure looks better than the first one, as it has the correct GROUP BY clause.
It is hard to provide a 100% sure response without seeing the full tables structures, however if you are still missing records with 0 checkouts in the output, it means that one of your INNER JOINs is not matching. In other words, you have DVDs in your catalog that are either not present in the dvd table, or not present in the physical_item table. As both tables look like referential table, this could indicate a discrepency in your data. I would recommend to change all INNER JOINs to LEFT JOINs to work around this issue.
Also please note that if no checkout happened for a DVD, expression COUNT(T.TRANSACTION_ID) will yield NULL : hence you want to wrap it in a NVL function to handle this case.
New query :
SELECT C.TITLE, D.LENGTH, C.RELEASE_DATE, NVL(COUNT(T.TRANSACTION_ID), 0)
FROM catalog_item C
LEFT JOIN dvd D ON D.CATALOG_ITEM_ID = C.CATALOG_ITEM_ID
LEFT JOIN physical_item P ON P.CATALOG_ITEM_ID = C.CATALOG_ITEM_ID
LEFT JOIN transaction T ON T.PHYSICAL_ITEM_ID = P.PHYSICAL_ITEM_ID
GROUP BY C.TITLE, D.LENGTH, C.RELEASE_DATE;

WHERE clause in an SQL query

I THINK what is happening with this query is if there are no records in the GenericAttribute table associated with the Product, then that product is not displayed. See line below in WHERE clause: "AND GenericAttribute.KeyGroup = 'Product'"
Is there a way to reword so that that part of the WHERE is ignored if no associated record in the GenericAttribute table?
Also, looking at my ORDER BY clause, will a record from the product table still show up if it has no associated record in the Pvl_AdDates table?
Thanks!
SELECT DISTINCT Product_Category_Mapping.CategoryId, Product.Id, Product.Name, Product.ShortDescription, Pvl_AdDates.Caption, Pvl_AdDates.EventDateTime, convert(varchar(25), Pvl_AdDates.EventDateTime, 120) AS TheDate, Pvl_AdDates.DisplayOrder, Pvl_Urls.URL, [Address].FirstName, [Address].LastName, [Address].Email, [Address].Company, [Address].City, [Address].Address1, [Address].Address2, [Address].ZipPostalCode, [Address].PhoneNumber
FROM [Address]
RIGHT JOIN (GenericAttribute
RIGHT JOIN (Pvl_Urls RIGHT JOIN (Pvl_AdDates
RIGHT JOIN (Product_Category_Mapping
LEFT JOIN Product
ON Product_Category_Mapping.ProductId = Product.Id)
ON Pvl_AdDates.ProductId = Product.Id)
ON Pvl_Urls.ProductId = Product.Id)
ON GenericAttribute.EntityId = Product.Id)
ON Address.Id = convert(int, GenericAttribute.Value)
WHERE
Product_Category_Mapping.CategoryId=12
AND GenericAttribute.KeyGroup = 'Product'
AND Product.Published=1
AND Product.Deleted=0
AND Product.AvailableStartDateTimeUtc <= getdate()
AND Product.AvailableEndDateTimeUtc >= getdate()
ORDER BY
Pvl_AdDates.EventDateTime DESC,
Product.Id,
Pvl_AdDates.DisplayOrder
I strongly encourage you to not mix left join and right join. I have written many SQL queries and cannot think of an occasion when that was necessary.
In fact, just stick to left join.
If you want all products (or at least all products not filtered out by the where clause), then start with the products table and go from there:
FROM Products p LEFT JOIN
Product_Category_Mapping pcm
ON pcm.ProductId = p.Id LEFT JOIN
Pvl_AdDates ad
ON ad.ProductId = p.id LEFT JOIN
Pvl_Urls u
ON u.ProductId = p.id LEFT JOIN
GenericAttribute ga
ON ga.EntityId = p.id LEFT JOIN
Address a
ON a.Id = convert(int, ga.Value)
Note that I added table aliases. These make queries easier to write and to read.
I would add a caution. It looks like you are combining data along different dimensions. You are likely to get a Cartesian product of the dimension attributes for each dimension. Perhaps that is what you want or the WHERE clause takes care of the additional rows.
Yes put constraints (restrictions) on tables on the outer side of outer joins in the on conditions of the outer join, not in the where clause. Conditions in where clauses are not evaluated and applied until after the outer joins are evaluated, so where there is not record in the outer table, the predicate will be false and entire row will be eliminated, undoing the outer-ness. Conditions in the join are evaluated during the join, before the rows from the inner side are added back in, so the result set will still include them.
Second, formatting formatting, formatting! Stick to one direction of join (left is easier) and use Aliases for tables names!
SELECT DISTINCT m.CategoryId, p.Id,
p.Name, p.ShortDescription, d.Caption, d.EventDateTime,
convert(varchar(25), d.EventDateTime, 120) TheDate,
d.DisplayOrder, u.URL, a.FirstName, a.LastName,
a.Email, a.Company, a.City, a.Address1, a.Address2,
a.ZipPostalCode, a.PhoneNumber
FROM Product_Category_Mapping m
left join Product p on p.Id = m.ProductId
and p.Published=1
and p.Deleted=0
and p.AvailableStartDateTimeUtc <= getdate()
and p.AvailableEndDateTimeUtc >= getdate()
left join Pvl_AdDates d ON d.ProductId = p.Id
left join Pvl_Urls u ON u.ProductId = p.Id
left join GenericAttribute g ON g.EntityId = p.Id
and g.KeyGroup = 'Product'
left join [Address] a ON a.Id = convert(int, g.Value)
WHERE m.CategoryId=12
ORDER BY d.EventDateTime DESC, p.Id, d.DisplayOrder

Query extensibility with WHERE EXISTS with a large table

The following query is designed to find the number of people who went to a hospital, the total number of people who went to a hospital and the divide those two to find a percentage. The table Claims is two million plus rows and does have the correct non-clustered index of patientid, admissiondate, and dischargdate. The query runs quickly enough but I'm interested in how I could make it more usable. I would like to be able to add another code in the line where (hcpcs.hcpcs ='97001') and have the change in percentRehabNotHomeHealth be relfected in another column. Is there possible without writing a big, fat join statement where I join the results of the two queries together? I know that by adding the extra column the math won't look right, but I'm not worried about that at the moment. desired sample output: http://imgur.com/BCLrd
database schema
select h.hospitalname
,count(*) as visitCounts
,hospitalcounts
,round(count(*)/cast(hospitalcounts as float) *100,2) as percentRehabNotHomeHealth
from Patient p
inner join statecounties as sc on sc.countycode = p.countycode
and sc.statecode = p.statecode
inner join hospitals as h on h.npi=p.hospitalnpi
inner join
--this join adds the hospitalCounts column
(
select h.hospitalname, count(*) as hospitalCounts
from hospitals as h
inner join patient as p on p.hospitalnpi=h.npi
where p.statecode='21' and h.statecode='21'
group by h.hospitalname
) as t on t.hospitalname=h.hospitalname
--this where exists clause gives the visitCounts column
where h.stateCode='21' and p.statecode='21'
and exists
(
select distinct p2.patientid
from Patient as p2
inner join Claims as c on c.patientid = p2.patientid
and c.admissiondate = p2.admissiondate
and c.dischargedate = p2.dischargedate
inner join hcpcs on hcpcs.hcpcs=c.hcpcs
inner join hospitals as h on h.npi=p2.hospitalnpi
where (hcpcs.hcpcs ='97001' or hcpcs.hcpcs='9339' or hcpcs.hcpcs='97002')
and p2.patientid=p.patientid
)
and hospitalcounts > 10
group by h.hospitalname, t.hospitalcounts
having count(*)>10
You might look into CTE (Common Table Expressions) to get what you need. It would allow you to get summarized data and join that back to the detail on a common key. As an example I modified your join on the subquery to be a CTE.
;with hospitalCounts as (
select h.hospitalname, count(*) as hospitalCounts
from hospitals as h
inner join patient as p on p.hospitalnpi=h.npi
where p.statecode='21' and h.statecode='21'
group by h.hospitalname
)
select h.hospitalname
,count(*) as visitCounts
,hospitalcounts
,round(count(*)/cast(hospitalcounts as float) *100,2) as percentRehabNotHomeHealth
from Patient p
inner join statecounties as sc on sc.countycode = p.countycode
and sc.statecode = p.statecode
inner join hospitals as h on h.npi=p.hospitalnpi
inner join hospitalCounts on t.hospitalname=h.hospitalname
--this where exists clause gives the visitCounts column
where h.stateCode='21' and p.statecode='21'
and exists
(
select p2.patientid
from Patient as p2
inner join Claims as c on c.patientid = p2.patientid
and c.admissiondate = p2.admissiondate
and c.dischargedate = p2.dischargedate
inner join hcpcs on hcpcs.hcpcs=c.hcpcs
inner join hospitals as h on h.npi=p2.hospitalnpi
where (hcpcs.hcpcs ='97001' or hcpcs.hcpcs='9339' or hcpcs.hcpcs='97002')
and p2.patientid=p.patientid
)
and hospitalcounts > 10
group by h.hospitalname, t.hospitalcounts
having count(*)>10

SQL Inner join division

I have issue with my inner join division below. From my oracle, it keep prompt me missing right parenthesis when I have already close it. I'll need to get the names of the patient who have collected all items.
Select P.name
From ((((Select Patientid From Patient) As P
Inner Join (Select Accountno, Patientid From Account) As A1
on P.PatientID = A1.PatientID)
Inner Join (Select Accountno, Itemno From AccountType) As Al
On A1.Accountno = Al.Accountno)
Inner Join (Select Itemno From Item) As I
On Al.Itemno = I.Itemno)
Group By Al.Itemno
Having Count(*) >= (Select Count(*) FROM AccountType);
Here's a simpler approach that I believe is essentially equivalent:
select a.name
from Patient a
inner join Account b on a.PatientID = b.PatientID
inner join AccountType c on b.Accountno = c.Accountno
inner join Item d on c.Itemno = d.Itemno
group by c.Accountno, a.name
having Count(*) >= (Select Count(*) FROM AccountType);
This approach is a bit simpler. It has the added benefit of being much more likely to use indexes on the tables -- if you do joins between what are essentially 'join tables' in memory, you don't get the benefit of the indexes that exist for the physical tables in memory.
I also usually alias table names using sequential letters -- 'a', 'b', 'c', 'd' as you can see. I find that when I'm writing complicated queries it makes it easier for me to follow. 'a' is the first table in the join, 'b' is the second, etc.
It sounds like you just want
SELECT p.name
FROM patient p
INNER JOIN account a ON (a.patientID = p.patientID)
INNER JOIN accountType accTyp ON (accTyp.accountNo = a.accountNo)
INNER JOIN item i ON (i.itemNo = accTyp.itemNo)
GROUP BY accTyp.itemNo
HAVING COUNT(*) = (SELECT COUNT(*)
FROM accountType);
Note that having an alias of A1 and an alias of Al is quite confusing. You want to pick more meaningful and more distinguishing aliases.