How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows? - sql

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.

If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).

I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10

Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

Related

SQL Query to remove duplicated data and take single column sum

I have the following table resulted from
SELECT m.MedName as [Medicine],m.MedSellPrice as [RetailPrice],m.MedType as [Type],
m.SoldQuantity as [Sold],m.Quantity as [Available],b.BillAmount as [Total Bill],b.BillDate
FROM BillMedicine AS bm LEFT JOIN
Medicine AS m
ON bm.MedicineID=m.id LEFT JOIN
Bill AS b
ON bm.BilIID = b. ID
but now I want to remove the repeated rows except the Sum of 'TotalBill'.
Use GROUP BY:
SELECT
m.MedName AS [Medicine],
m.MedSellPrice AS [RetailPrice],
m.MedType AS [Type],
m.SoldQuantity AS [Sold],
m.Quantity AS [Available],
SUM(b.BillAmount) AS [Total Bill]
FROM BillMedicine AS bm
LEFT JOIN Medicine AS m
ON bm.MedicineID = m.id
LEFT JOIN Bill AS b
ON bm.BilIID = b.ID
GROUP BY
m.MedName,
m.MedSellPrice,
m.MedType,
m.SoldQuantity,
m.Quantity;
Note that for the billing date, the two "duplicate" records you have highlighted have different dates. It is not clear which date, if any, you want to report here. I have omitted this column.
GROUP BY Is Best Option for DUPLICATE DATE Removed & SUM.
Select Column1,column2....., SUM(Total) as Total From Tablename Group BY column1,column2
You seem to want most (or all) columns from m and then the sum from another table. One method is a lateral join or correlated subquery:
SELECT m.*, -- or whatever columns you want,
(SELECT SUM(b.BillAmount)
FROM BillMedicine bm JOIN
Bill b
ON bm.BilIID = b.ID
WHERE bm.MedicineID = m.id
) as [Total Bill]
FROM Medicine m ;
I suggest this approach for several reasons.
This is often more efficient than an outer aggregation.
You have LEFT JOINs but they do not look correct. I suspect you want to start with the Medicine table.
You are including a date/time in the results, but clearly that is not appropriate when combining multiple rows.

SQL joined tables are causing duplicates

So table A is an overall table of policy_id information, while table b is policy_id's with claims attached. Not all of the id's in A exist in B, but I want to join the two tables and sum(total claims).
The issue is that the sum is way higher than the actual sum within the table itself.
Here is what I've tried so far:
select a.policy_id, coalesce(sum(b.claim_amt), 0)
from database.table1 as a
left join database2.table2 as b on a.policy_id = b.policy_id
where product_code = 'CI'
group by a.policy_id
The id's that don't exist in b show up just fine with a 0 next to them, it's the ones that do exist where the claim_amt's seem like they're being duplicated heavily in the sum.
I suspect your policy_id in table1 are not unique and that leads to the doubled,tripled ,etc. amounts
You could aggregate the sums from table2 in a CTE to get around this.
WITH CTE AS (
SELECT
policy_id
coalesce(sum(claim_amt), 0) as sum_amt
FROM database2.table2
group by policy_id
)
select a.policy_id, b.sum_amt
from database.table1 as a
left join CTE as b on a.policy_id = b.policy_id
where product_code = 'CI'

Match two tables based on minimum dates efficiently

I have two tables one which contains quarterly data and one which contains daily data. I would like to join the two tables such that for each day in the daily data the quarterly data for that quarterly is selected and returned daily. I am working with Postgres 9.3.
The current query is as follows:
select
a.ID,
a.datadate,
b.*,
case when a.datadate = b.rdq then 1 else 0 end as VALID
from proj_data a, proj_rat b
where a.id = b.id
and b.rdq = (select min(rdq)
from proj_rat c
where a.id = c.id and a.datadate >= c.rdq);
But it is excruciatingly slow and I need to do this for several thousand IDs. Can anyone suggest a more efficient solution?
This eliminates the need for a subquery in the where clause
select
ID,
a.datadate,
b.*,
(a.datadate = b.rdq)::integer as VALID
from
proj_data a
inner join
(
select distinct on (id, rdq) *
from project_rat
order by id, rdq
) b using(id)
where a.datadate >= b.rdq;

Self Join bringing too many records

I have this query to express a set of business rules.
To get the information I need, I tried joining the table on itself but that brings back many more records than are actually in the table. Below is the query I've tried. What am I doing wrong?
SELECT DISTINCT a.rep_id, a.rep_name, count(*) AS 'Single Practitioner'
FROM [SE_Violation_Detection] a inner join [SE_Violation_Detection] b
ON a.rep_id = b.rep_id and a.hcp_cid = b.hcp_cid
group by a.rep_id, a.rep_name
having count(*) >= 2
You can accomplish this with the having clause:
select a, b, count(*) c
from etc
group by a, b
having count(*) >= some number
I figured out a simpler way to get the information I need for one of the queries. The one above is still wrong.
--Rep violation for different HCP more than 5 times
select distinct rep_id,rep_name,count(distinct hcp_cid)
AS 'Multiple Practitioners'
from dbo.SE_Violation_Detection
group by rep_id,rep_name
having count(distinct hcp_cid)>4
order by count(distinct hcp_cid)

Rewrite SQL and use of group by

I have written below sql for one of the requirement and is fetching my results. But, I am wondering if there is any better way of writing this query rather than using alias table as A.
SELECT A.*,B.OPRDEFNDESC FROM
( select OPRID_ENTERED_BY ,COUNT(*)
from ps_req_hdr
where entered_dt > '01-JUL-2012'
GROUP BY OPRID_ENTERED_BY
ORDER BY COUNT(*) DESC) A, PSOPRDEFN B
WHERE A.OPRID_ENTERED_BY=B.OPRID
You may be able to use a simple INNER JOIN to do the same thing...
SELECT A.OPRID_ENTERED_BY, COUNT(*), B.OPRDEFNDESC
FROM ps_req_hdr A
JOIN PSOPRDEFN B ON A.OPRID_ENTERED_BY = B.OPRID
WHERE A.entered_dt > '01-JUL-2012'
GROUP BY A.OPRID_ENTERED_BY, B.OPRDEFNDESC
ORDER BY COUNT(*) DESC
NOTE
As per the comments below, the COUNT(*) result for this query will NOT include records that don't have corresponding matches in table B, and it will inflate for non-unique matches in table B. What this means is: if B.OPRID is not a unique field or if A.OPRID_ENTERED_BY is not a foreign key for B.OPRID then this answer will not yield the same results as the original query.