SQL joined tables are causing duplicates - sql

So table A is an overall table of policy_id information, while table b is policy_id's with claims attached. Not all of the id's in A exist in B, but I want to join the two tables and sum(total claims).
The issue is that the sum is way higher than the actual sum within the table itself.
Here is what I've tried so far:
select a.policy_id, coalesce(sum(b.claim_amt), 0)
from database.table1 as a
left join database2.table2 as b on a.policy_id = b.policy_id
where product_code = 'CI'
group by a.policy_id
The id's that don't exist in b show up just fine with a 0 next to them, it's the ones that do exist where the claim_amt's seem like they're being duplicated heavily in the sum.

I suspect your policy_id in table1 are not unique and that leads to the doubled,tripled ,etc. amounts
You could aggregate the sums from table2 in a CTE to get around this.
WITH CTE AS (
SELECT
policy_id
coalesce(sum(claim_amt), 0) as sum_amt
FROM database2.table2
group by policy_id
)
select a.policy_id, b.sum_amt
from database.table1 as a
left join CTE as b on a.policy_id = b.policy_id
where product_code = 'CI'

Related

How to avoid duplicates in left table where primary key is not unique in joined table

I am having SUM issues when joining 2 tables, whereby the primary key is unique in the left table but can be duplicated in the right table. The scenario I have is that a case_id may have for example a payment of £100 in the left table, which is then broken down at a lower level in to 2 £50 payments in the right table. This is causing the left table payment to be counted twice when joining as the case_id exists twice in the right table.
I have tried a number of different variations of the query but have so far been unsuccessful. I have also searched this website but have been unable to find a scenario that fits mine.
select distinct
t1.[r_code],
t1.[parent_case_id],
sum(t1.[total_redress_value]),
sum(t2.[payment_amount])
from
[SomeTable1] t1
left join
[SomeTable2] t2 on t1.[case_id] = t2.[case_id]
group by
t1.[r_code], t1.[parent_case_id]
Expecting the SUM of total_redress_value & payment_amount to be 100 each, however am finding that SUM of total_redress_value is 200 due to the duplicated case_id row from the join. Any help greatly appreciated.
Group you right table by the PK of the left.
SELECT DISTINCT
t1.[r_code],
t1.[parent_case_id],
SUM(t1.[total_redress_value]),
SUM(t2.[payment_amount])
FROM [SomeTable1] t1
LEFT JOIN
(
SELECT case_id,
MIN(payment_amount) AS payment_amount --or sum etc - whatever fits your logic
FROM [SomeTable2]
GROUP BY case_id
) AS t2
ON t1.[case_id] = t2.[case_id]
GROUP BY t1.[r_code],
t1.[parent_case_id];
Unfortunately, this type of hierarchical calculation is a little complicated. You can pre-aggregation t2 before joining:
select t1.[r_code], t1.[parent_case_id],
sum(t1.[total_redress_value]),
sum(t2.[payment_amount])
from [SomeTable1] as t1 left join
(select t2.case_id, sum(t2.payment_amount) as payment_amount
from [SomeTable2] as t2
group by t2.case_id
) as t2
on t1.[case_id] = t2.[case_id]
group by t1.[r_code], t1.[parent_case_id]
Note that select distinct is almost never needed with group by. And it is certainly not needed in this case.

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

SQL Inner Join using Distinct and Order by Desc

table a.
Table b . I have two tables. Table A has over 8000+ records and continues to grow with time.
Table B has only 5 or so records and grows rarely but does grow sometimes.
I want to query Table A's last records where the Id for Table A matches for Table B. The problem is; I am getting all the rows from Table A. I just need the ones where Table A and B match once. These are unique Id's when a new row is inserted into table B and never get repeated.
Any help is most appreciated.
SELECT a.nshift,
a.loeeworkcellid,
b.loeeconfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
FROM oeeworkcell a
INNER JOIN dbo.oeeconfigworkcell b
ON a.loeeconfigworkcellid = b.loeeconfigworkcellid
ORDER BY a.loeeworkcellid DESC
I am assuming you want to get the only the lastest (as you said) row from the TableA but JOIN giving you all the rows.You can use the Row_Number() to get the rownumber and then apply the join and filter it with the Where clause to select only the first row from the JOIN. So what you can try as below,
;WITH CTE
AS
(
SELECT * , ROW_NUMBER() OVER(PARTITION BY loeeconfigworkcellid ORDER BY loeeworkcellid desc) AS Rn
FROM oeeworkcell
)
SELECT a.nshift,
a.loeeworkcellid,
b.loeecoonfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
FROM CTE a
INNER JOIN dbo.oeeconfigworkcell b
ON a.loeeconfigworkcellid = b.loeeconfigworkcellid
WHERE
a.Rn = 1
You need to group by your data and select only the data having the condition with min id.
SELECT a.nshift,
a.loeeworkcellid,
b.loeecoonfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
FROM oeeworkcell a
INNER JOIN dbo.oeeconfigworkcell b
ON a.loeeconfigworkcellid = b.loeeconfigworkcellid
group by
a.nshift,
a.loeeworkcellid,
b.loeecoonfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
having a.loeeworkcellid = min(a.loeeworkcellid)

Sum Distinct Rows Only In Sql Server

I have four tables,in which First has one to many relation with rest of three tables named as (Second,Third,Fourth) respectively.I want to sum only Distinct Rows returned by select query.Here is my query, which i try so far.
select count(distinct First.Order_id) as [No.Of Orders],sum( First.Amount) as [Amount] from First
inner join Second on First.Order_id=Second.Order_id
inner join Third on Third.Order_id=Second.Order_id
inner join Fourth on Fourth.Order_id=Third.Order_id
The outcome of this query is :
No.Of Orders Amount
7 69
But this Amount should be 49,because the sum of First column Amount is 49,but due to inner join and one to many relationship,it calculate sum of also duplicate rows.How to avoid this.Kindly guide me
I think the problem is cartesian products in the joins (for a given id). You can solve this using row_number():
select count(t1234.Order_id) as [No.Of Orders], sum(t1234.Amount) as [Amount]
from (select First.*,
row_number() over (partition by First.Order_id order by First.Order_id) as seqnum
from First inner join
Second
on First.Order_id=Second.Order_id inner join
Third
on Third.Order_id=Second.Order_id inner join
Fourth
on Fourth.Order_id=Third.Order_id
) t1234
where seqnum = 1;
By the way, you could also express this using conditions in the where clause, because you appear to be using the joins only for filtering:
select count(First.Order_id) as [No.Of Orders], sum(First.Amount) as [Amount]
from First
where exists (select 1 from second where First.Order_id=Second.Order_id) and
exists (select 1 from third where First.Order_id=third.Order_id) and
exists (select 1 from fourth where First.Order_id=fourth.Order_id);

Select SUM from multiple tables

I keep getting the wrong sum value when I join 3 tables.
Here is a pic of the ERD of the table:
(Original here: http://dl.dropbox.com/u/18794525/AUG%207%20DUMP%20STAN.png )
Here is the query:
select SUM(gpCutBody.actualQty) as cutQty , SUM(gpSewBody.quantity) as sewQty
from jobOrder
inner join gpCutHead on gpCutHead.joNum = jobOrder.joNum
inner join gpSewHead on gpSewHead.joNum = jobOrder.joNum
inner join gpCutBody on gpCutBody.gpCutID = gpCutHead.gpCutID
inner join gpSewBody on gpSewBody.gpSewID = gpSewHead.gpSewID
If you are only interested in the quantities of cuts and sews for all orders, the simplest way to do it would be like this:
select (select SUM(gpCutBody.actualQty) from gpCutBody) as cutQty,
(select SUM(gpSewBody.quantity) from gpSewBody) as sewQty
(This assumes that cuts and sews will always have associated job orders.)
If you want to see a breakdown of cuts and sews by job order, something like this might be preferable:
select joNum, SUM(actualQty) as cutQty, SUM(quantity) as sewQty
from (select joNum, actualQty, 0 as quantity
from gpCutBody
union all
select joNum, 0 as actualQty, quantity
from gpSewBody) sc
group by joNum
Mark's approach is a good one. I want to suggest the alternative of doing the group by's before the union, simply because this can be a more general approach for summing along multiple dimensions.
Your problem is that you have two dimensions that you want to sum along, and you are getting a cross product of the values in the join.
select joNum, act.quantity as ActualQty, q.quantity as Quantity
from (select joNum, sum(actualQty) as quantity
from gpCutBody
group by joNum
) act full outer join
(select joNum, sum(quantity) as quantity
from gpSewBody
group by joNum
) q
on act.joNum = q.joNum
(I have kept Mark's assumption that doing this by joNum is the desired output.)