Why do I have multiple entries per entity in the query output? - sql

Would like to know why my query displays multiple entries per entity in the output.
From what I understand there is only one active policy per entity.
Created query with SQL Server Management Studio, my output to display correctly has parameters, and I have tried the following with my query.
Currently my SQL SSMS query output displays the following:
Entity_Number Building_Name PolicyID Description Start_Date End_Date
400 Xpress 4 5 Day Grace 7/1/2019 9/27/2019
400 Xpress 18 2 Day Grace 7/3/2018 7/13/2018
400 Xpress 19 4 Day Grace 2/27/2019 2/27/2019
What I really would like to know is how do I drill down and find out why my query returns multiples?
[Query]
SELECT
e.Entity_Number,
bld.Building_Name,
cbp.PolicyId,
cbp.Description,
cbp.StartDate,
cbp.EndDate
FROM
dbo.buildings AS bld
INNER JOIN dbo.entities AS e
ON bld.Entity_ID = e.Entity_ID
INNER JOIN Collections.Building AS cbp
ON bld.Building_ID = cb.BuildingId
INNER JOIN Collections.BuildingProfile AS cbpro
ON cbp.BuildingPolicyId = cbpro.BuildingPolicyId
WHERE
bld.Building_Active = 1
AND e.Active = 1

Use the "salami technique" to isolate where the unexpected rows come from. What I mean by this is that you cut down the query like a salami by omitting each join (and any column references related to that join) one by one.
e.g. start with masking the join to Collections.BuildingProfile:
SELECT
e.Entity_Number
, bld.Building_Name
, cbp.PolicyId
, cbp.Description
, cbp.StartDate
, cbp.EndDate
FROM dbo.buildings AS bld
INNER JOIN dbo.entities AS e ON bld.Entity_ID = e.Entity_ID
INNER JOIN Collections.Building AS cbp ON bld.Building_ID = cbp.BuildingId
-- INNER JOIN Collections.BuildingProfile AS cbpro ON cbp.BuildingPolicyId = cbpro.BuildingPolicyId
WHERE bld.Building_Active = 1
AND e.Active = 1
Does this remove the unexpected columns? If not then try:
SELECT
e.Entity_Number
, bld.Building_Name
--, cbp.PolicyId
--, cbp.Description
--, cbp.StartDate
--, cbp.EndDate
FROM dbo.buildings AS bld
INNER JOIN dbo.entities AS e ON bld.Entity_ID = e.Entity_ID
--INNER JOIN Collections.Building AS cbp ON bld.Building_ID = cbp.BuildingId
--INNER JOIN Collections.BuildingProfile AS cbpro ON cbp.BuildingPolicyId = cbpro.BuildingPolicyId
WHERE bld.Building_Active = 1
AND e.Active = 1
Eventually by masking out each join (and any related column references to that table) you will discover which table is producing the unexpected multiplication of rows.
Once that table is identified I suggest you reconsider all assumptions you have made about how that table had been joined. For example, you state that " From what I understand there is only one active policy per entity." Is that really true?
Once you know where the problem starts, and you reconsider how that data should actually be used within the query, you should be closer to a solution. e.g. perhaps you need more conditions in the join, or you need to join a subquery instead of directly to the table.
Note:
Collections.BuildingProfile does not seem needed by the query, why not omit it
anyway?
reformatting for "comma first" in the select clause helps simplify use of the "salami technique"

Related

How to Display Two Query Side by Side using SQL

I have the following 2 queries that are almost identical except the second query contains a table join, where clause and has less FAXDEPTs(the commented out portion is included in Query 2 just shared the one query to make easier to read).
I want to join the two queries by FAXDEPT and have the results look something like this,
FAXDEPT| TOTAL DOCUMENTS(Query1)|TOTAL PAGES (Query1)|TOTAL DOCUMENTS(Query2)|TOTAL PAGES (Query2)
Query 1 contains more FAXDEPTs than Query2. Is there a way I can display "0"s for Query 2 TOTAL PAGES and TOTAL DOCS
I'm assumed I would do some sort of FUll OUTER JOIN but can't seem to get it to work. Not sure if using alias is part of my issue or not. I appreciate all the help in advance!
select sq.faxdept 'Fax Department', count(sq.docid)'Total Documents', sum(sq.pages)'Total Pages'
from
(
select idp.id as docid, (MAX(idp.pagenum)+1) as pages, ki178.keyvaluechar as faxdept
from DOCDATA dd
left join FAXDEPT ki178 on id.id = ki178.id
left join IMPORTSOURCE ki228 on id.id = ki228.id
left join BATCHINFO kgd105 on dd.id = kgd105.id
left join PAGEDATA idp on id.id = idp.id
--left join QUEUE ilc on id.id = ilc.id
where
id.datestored > '10/07/2021' and id.status = 0
--and ilc.num = 252
and (kgd105.kg128 like 'DISC DIP%SJIN%' or ki228.keyvaluechar like 'SJIN' )
group by idp.id, ki178.keyvaluechar
)
as sq
group by sq.faxdept

Left Join On And clause not supported

I've looked into various posts (this one, that one and this other one) and thought I got the answer.
After a LEFT JOIN I may add an ON [condition] AND [other condition] (I've also tried WHERE). But computer says no. Access keeps saying the join expression is not supported.
Consider the student_records table below:
STUDENTCODE | SEMESTERINDEX
12345 | 20112
12345 | 20113
12345 | 20121
67890 | 0
67890 | 20111
67890 | 20112
I want to find the minimum SEMESTERINDEX for each student from my students table, that's above 20001. (Records below may be erroneous and the 0 and 1 SEMESTERINDEX is used for transferred credits.)
I'm using access so there are VBA functions inside the SQL. There's several more tables I'm joining too, I'm quoting the whole query.
SELECT students.STUDENTCODE, prefixes.PREFIXNAMEENG,
students.STUDENTNAMEENG, students.STUDENTSURNAMEENG, levels.level_name, programs.PROGRAMNAMEENG, calendars.calendar_load,
MAX(student_records.SEMESTERINDEX) AS latest_semester, MIN(student_records.SEMESTERINDEX) AS intake_semester,
FROM student_records LEFT JOIN (
(
(
(
(students LEFT JOIN prefixes ON students.PREFIXID = prefixes.PREFIXID)
LEFT JOIN levels ON students.LEVELID = levels.level_id)
LEFT JOIN programs ON students.PROGRAMID = programs.PROGRAMID)
LEFT JOIN calendar_conversion ON students.SCHEDULEGROUPID = calendar_conversion.schedule_id)
LEFT JOIN calendars ON calendar_conversion.calendar_id = calendars.calendar_id) ON student_records.STUDENTCODE = students.STUDENTCODE AND student_records.SEMESTERINDEX> 2001
GROUP BY students.STUDENTCODE, prefixes.PREFIXNAMEENG, students.STUDENTNAMEENG, students.STUDENTSURNAMEENG, levels.level_name, programs.PROGRAMNAMEENG, calendars.calendar_load;
So did I misplace the AND student_records.SEMESTERINDEX > 2001?
oh my save me from these parenthesis and crazy indenting.
Here is how you do it. All the parenthesis don't matter in SQL
SELECT
students.STUDENTCODE,
prefixes.PREFIXNAMEENG,
students.STUDENTNAMEENG,
students.STUDENTSURNAMEENG,
levels.level_name,
programs.PROGRAMNAMEENG,
calendars.calendar_load,
minmax.latest_semester,
minmax.intake_semester,
FROM student_records
LEFT JOIN (
SELECT
studentcode,
MAX(student_records.SEMESTERINDEX) AS latest_semester,
MIN(student_records.SEMESTERINDEX) AS intake_semester
FROM students
WHERE students.STUDENTCODE > 2001
GROUP BY studentcode
) as MinMax ON student_records.STUDENTCODE = minmax.STUDENTCODE
LEFT JOIN students ON student_records.STUDENTCODE = students.STUDENTCODE
LEFT JOIN prefixes ON students.PREFIXID = prefixes.PREFIXID
LEFT JOIN levels ON students.LEVELID = levels.level_id
LEFT JOIN programs ON students.PROGRAMID = programs.PROGRAMID
LEFT JOIN calendar_conversion ON students.SCHEDULEGROUPID = calendar_conversion.schedule_id
LEFT JOIN calendars ON calendar_conversion.calendar_id = calendars.calendar_id
This is called a sub-query in sql it allows you to perform your grouping on a sub-set and then join that back to the rest of the data.
I think you went wrong thinking there was something about the join that needed a filter -- in fact it is the data that you were joining to that needed to be filtered.

Retrieve additional rows if bit flag is true

I have a large stored procedure that is used to return results for a dialog with many selections. I have a new criteria to get "extra" rows if a particular bit column is set to true. The current setup looks like this:
SELECT
CustomerID,
FirstName,
LastName,
...
FROM HumongousQuery hq
LEFT JOIN (
-- New Query Text
) newSubQuery nsq ON hq.CustomerID = nsq.CustomerID
I have the first half of the new query:
SELECT DISTINCT
c.CustomerID,
pp.ProjectID,
ep.ProductID
FROM Customers c
JOIN Evaluations e (NOLOCK)
ON c.CustomerID = e.CustomerID
JOIN EvaluationProducts ep (NOLOCK)
ON e.EvaluationID = ep.EvaluationID
JOIN ProjectProducts pp (NOLOCK)
ON ep.ProductID = pp.ProductID
JOIN Projects p
ON pp.ProjectID = p.ProjectID
WHERE
c.EmployeeID = #EmployeeID
AND e.CurrentStepID = 5
AND p.IsComplete = 0
The Projects table has a bit column, AllowIndirectCustomers, which tells me that this project can use additional customers when the value is true. As far as I can tell, the majority of the different SQL constructs are geared towards adding additional columns to the result set. I tried different permutations of the UNION command, with no luck. Normally, I would turn to a table-valued function, but I haven't been able to make it work with this scenerio.
This one has been a stumper for me. Any ideas?
So basically, you're looking to negate the need to match pp.ProjectID = p.ProjectID when the flag is set. You can do that right in the JOIN criteria:
JOIN Projects p
ON pp.ProjectID = p.ProjectID OR p.AllowIndirectCustomers = 1
Depending on the complexity of your tables, this might not work out too easily, but you could do a case statement on your bit column. Something like this:
select table1.id, table1.value,
case table1.flag
when 1 then
table2.value
else null
end as secondvalue
from table1
left join table2 on table1.id = table2.id
Here's a SQL Fiddle demo

Receiving 1 row from joined (1 to many) postgresql

I have this problem:
I have 2 major tables (apartments, tenants) that have a connection of 1 to many (1 apartment, many tenants).
I'm trying to pull all my building apartments, but with one of his tenants.
The preffered tenant is the one who have ot=2 (there are 2 possible values: 1 or 2).
I tried to use subqueries but in postgresql it doesn't let you return more than 1 column.
I don't know how to solve it. Here is my latest code:
SELECT a.apartment_id, a.apartment_num, a.floor, at.app_type_desc_he, tn.otype_desc_he, tn.e_name
FROM
public.apartments a INNER JOIN public.apartment_types at ON
at.app_type_id = a.apartment_type INNER JOIN
(select t.apartment_id, t.building_id, ot.otype_id, ot.otype_desc_he, e.e_name
from public.tenants t INNER JOIN public.ownership_types ot ON
ot.otype_id = t.ownership_type INNER JOIN entities e ON
t.entity_id = e.entity_id
) tn ON
a.apartment_id = tn.apartment_id AND tn.building_id = a.building_id
WHERE
a.building_id = 4 AND tn.building_id=4
ORDER BY
a.apartment_num ASC,
tn.otype_id DESC
Thanx in advance
SELECT a.apartment_id, a.apartment_num, a.floor
,at.app_type_desc_he, tn.otype_desc_he, tn.e_name
FROM public.apartments a
JOIN public.apartment_types at ON at.app_type_id = a.apartment_type
LEFT JOIN (
SELECT t.apartment_id, t.building_id, ot.otype_id
,ot.otype_desc_he, e.e_name
FROM public.tenants t
JOIN public.ownership_types ot ON ot.otype_id = t.ownership_type
JOIN entities e ON t.entity_id = e.entity_id
ORDER BY (ot.otype_id = 2) DESC
LIMIT 1
) tn ON (tn.apartment_id, tn.building_id)=(a.apartment_id, a.building_id)
WHERE a.building_id = 4
AND tn.building_id = 4
ORDER BY a.apartment_num; -- , tn.otype_id DESC -- pointless
Crucial part emphasized.
This works in either case.
If there are tenants for an apartment, exactly 1 will be returned.
If there is one (or more) tenant of ot.otype_id = 2, it will be one of that type.
If there are no tenants, the apartment is still returned.
If, for ot.otype_id ...
there are 2 possible values: 1 or 2
... you can simplify to:
ORDER BY ot.otype_id DESC
Debug query
Try removing the WHERE clauses from the base query and change
JOIN public.apartment_types
to
LEFT JOIN public.apartment_types
and add them back one by one to see which condition excludes all rows.
Do at.app_type_id and a.apartment_type really match?

Top 1 on Left Join SubQuery

I am trying to take a person and display their current insurance along with their former insurance. I guess one could say that I'm trying to flaten my view of customers or people. I'm running into an issue where I'm getting multiple records back due to multiple records existing within my left join subqueries. I had hoped I could solve this by adding "TOP 1" to the subquery, but that actually returns nothing...
Any ideas?
SELECT
p.person_id AS 'MIRID'
, p.firstname AS 'FIRST'
, p.lastname AS 'LAST'
, pg.name AS 'GROUP'
, e.name AS 'AOR'
, p.leaddate AS 'CONTACT DATE'
, [dbo].[GetPICampaignDisp](p.person_id, '2009') AS 'PI - 2009'
, [dbo].[GetPICampaignDisp](p.person_id, '2008') AS 'PI - 2008'
, [dbo].[GetPICampaignDisp](p.person_id, '2007') AS 'PI - 2007'
, a_disp.name AS 'CURR DISP'
, a_ins.name AS 'CURR INS'
, a_prodtype.name AS 'CURR INS TYPE'
, a_t.date AS 'CURR INS APP DATE'
, a_t.effdate AS 'CURR INS EFF DATE'
, b_disp.name AS 'PREV DISP'
, b_ins.name AS 'PREV INS'
, b_prodtype.name AS 'PREV INS TYPE'
, b_t.date AS 'PREV INS APP DATE'
, b_t.effdate AS 'PREV INS EFF DATE'
, b_t.termdate AS 'PREV INS TERM DATE'
FROM
[person] p
LEFT OUTER JOIN
[employee] e
ON
e.employee_id = p.agentofrecord_id
INNER JOIN
[dbo].[person_physician] pp
ON
p.person_id = pp.person_id
INNER JOIN
[dbo].[physician] ph
ON
ph.physician_id = pp.physician_id
INNER JOIN
[dbo].[clinic] c
ON
c.clinic_id = ph.clinic_id
INNER JOIN
[dbo].[d_Physgroup] pg
ON
pg.d_physgroup_id = c.physgroup_id
LEFT OUTER JOIN
(
SELECT
tr1.*
FROM
[transaction] tr1
LEFT OUTER JOIN
[d_vendor] ins1
ON
ins1.d_vendor_id = tr1.d_vendor_id
LEFT OUTER JOIN
[d_product_type] prodtype1
ON
prodtype1.d_product_type_id = tr1.d_product_type_id
LEFT OUTER JOIN
[d_commission_type] ctype1
ON
ctype1.d_commission_type_id = tr1.d_commission_type_id
WHERE
prodtype1.name <> 'Medicare Part D'
AND tr1.termdate IS NULL
) AS a_t
ON
a_t.person_id = p.person_id
LEFT OUTER JOIN
[d_vendor] a_ins
ON
a_ins.d_vendor_id = a_t.d_vendor_id
LEFT OUTER JOIN
[d_product_type] a_prodtype
ON
a_prodtype.d_product_type_id = a_t.d_product_type_id
LEFT OUTER JOIN
[d_commission_type] a_ctype
ON
a_ctype.d_commission_type_id = a_t.d_commission_type_id
LEFT OUTER JOIN
[d_disposition] a_disp
ON
a_disp.d_disposition_id = a_t.d_disposition_id
LEFT OUTER JOIN
(
SELECT
tr2.*
FROM
[transaction] tr2
LEFT OUTER JOIN
[d_vendor] ins2
ON
ins2.d_vendor_id = tr2.d_vendor_id
LEFT OUTER JOIN
[d_product_type] prodtype2
ON
prodtype2.d_product_type_id = tr2.d_product_type_id
LEFT OUTER JOIN
[d_commission_type] ctype2
ON
ctype2.d_commission_type_id = tr2.d_commission_type_id
WHERE
prodtype2.name <> 'Medicare Part D'
AND tr2.termdate IS NOT NULL
) AS b_t
ON
b_t.person_id = p.person_id
LEFT OUTER JOIN
[d_vendor] b_ins
ON
b_ins.d_vendor_id = b_t.d_vendor_id
LEFT OUTER JOIN
[d_product_type] b_prodtype
ON
b_prodtype.d_product_type_id = b_t.d_product_type_id
LEFT OUTER JOIN
[d_commission_type] b_ctype
ON
b_ctype.d_commission_type_id = b_t.d_commission_type_id
LEFT OUTER JOIN
[d_disposition] b_disp
ON
b_disp.d_disposition_id = b_t.d_disposition_id
WHERE
pg.d_physgroup_id = #PhysGroupID
In Sql server 2005 you can use OUTER APPLY
SELECT p.person_id, s.e.employee_id
FROM person p
OUTER APPLY (SELECT TOP 1 *
FROM Employee
WHERE /*JOINCONDITION*/
ORDER BY /*Something*/ DESC) s
http://technet.microsoft.com/en-us/library/ms175156.aspx
The pattern I normally use for this is:
SELECT whatever
FROM person
LEFT JOIN subtable AS s1
ON s1.personid = person.personid
...
WHERE NOT EXISTS
( SELECT 1 FROM subtable
WHERE personid = person.personid
AND orderbydate > s1.orderbydate
)
Which avoids the TOP 1 clause and maybe makes it a little clearer.
BTW, I like the way you've put this query together in general, except I'd leave out the brackets, assuming you have rationally named tables and columns; and you might even gain some performance (but at least elegance) by listing columns for tr1 and tr2, rather than "tr1.*" and "tr2.*".
Thanks for all of the feedback and ideas...
In the simplest of terms, I have a person table that stores contact information like name, email, etc. I have another table that stores transactions. Each transaction is really an insurance policy that would contain information on the provider, product type, product name, etc.
I want to avoid giving the user duplicate person records since this causes them to look for the duplicates prior to running mail merges, etc. I'm getting duplicates when there is more than 1 transaction that has not been terminated, and when there is more than 1 transaction that has been terminated.
Someone else suggested that I consider a cursor to grab my distinct contact records and then perform the sub selects to get the current and previous insurance information. I don't know if I want to head down that path though.
It's difficult to understand your question so first I'll throw this out there: does changing your SELECT to SELECT DISTINCT do what you want?
Otherwise, let me get this straight, you're trying to get your customers' current insurance and previous insurance, but they may actually have many insurances before that, recorded in the [transactions] table? I looked at your SQL for quite a few minutes but can't figure out what it all means, so could you please reduce it down to only the parts that are necessary? Then I'll think about it some more. It sounds to me like you need a GROUP BY somehow, but I can't work it out exactly yet.
Couldn't take the time to dig through all your SQL (what a beast!), here's an idea that might make things easier to handle:
select
p.person_id, p.name <and other person columns>,
(select <current policy columns>
from pol <and other tables for policy>
where pol.<columns for join> = p.person_id
and <restrictions for current policy>),
(select <previous policy columns>
from pol <and other tables for policy>
where pol.<columns for join> = p.person_id
and <restrictions for previouspolicy>),
<other columns>
from person p <and "directly related" tables>
This makes the statement easier to read by separating the different parts into their own subselects, and it also makes it easier to add a "Top 1" in without affecting the rest of the statement. Hope that helps.