Self Join bringing too many records - sql

I have this query to express a set of business rules.
To get the information I need, I tried joining the table on itself but that brings back many more records than are actually in the table. Below is the query I've tried. What am I doing wrong?
SELECT DISTINCT a.rep_id, a.rep_name, count(*) AS 'Single Practitioner'
FROM [SE_Violation_Detection] a inner join [SE_Violation_Detection] b
ON a.rep_id = b.rep_id and a.hcp_cid = b.hcp_cid
group by a.rep_id, a.rep_name
having count(*) >= 2

You can accomplish this with the having clause:
select a, b, count(*) c
from etc
group by a, b
having count(*) >= some number

I figured out a simpler way to get the information I need for one of the queries. The one above is still wrong.
--Rep violation for different HCP more than 5 times
select distinct rep_id,rep_name,count(distinct hcp_cid)
AS 'Multiple Practitioners'
from dbo.SE_Violation_Detection
group by rep_id,rep_name
having count(distinct hcp_cid)>4
order by count(distinct hcp_cid)

Related

Grouping removes (a lot) of lines?

I'm very confused by the results I'm getting between 2 queries that I thought should be identical.
select count(distinct client_id)
from database1 a
inner join database2 b on a.client_id=b.client_id and b.year = 2021
and
select clienttype, count(distinct client_id)
from database1 a
inner join database2 b on a.client_id=b.client_id and b.year = 2021
group by 1
First one gives me 200,000 while when I sum all the clienttypes from the second one it gives me 300,000. And in the result from the second query, NULL clienttypes are counted (roughly 50,000 so it doesn't explain the difference anyways).
Any idea why those two are not the same? All I'm doing is breaking it down by clienttype, did I miss something?
Thanks
You clearly have examples where client_id is in multiple types.
You can find these using:
select client_id, count(distinct clienttype)
from database1
group by client_id
having count(distinct clienttype) > 1;

SQL join count and select query

I have two tables, one is a list of 'gangs' and one is a list of 'gang_members' the gang_members.gang_id refers to the gang.id they are in, I know how to count all the members in one gang, but I need to join the following queries into one:
SELECT * FROM gangs LIMIT 8
SELECT count(gang_id) FROM gangs_members WHERE gang_id = <GANG ID>
I think this is possible, I could do it in a loop while it's going through the gangs but that would be inefficient
SELECT A.*, B.RC
FROM gangs A
LEFT JOIN (SELECT gang_id, COUNT(*) AS RC FROM gangs_members GROUP BY gang_id) B ON A.gang_id=B.gang_id
Probably something like this
SELECT count(gang_id)
FROM gangs_members
WHERE gang_id IN (SELECT gang_id FROM gangs LIMIT 8)

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

Rewrite SQL and use of group by

I have written below sql for one of the requirement and is fetching my results. But, I am wondering if there is any better way of writing this query rather than using alias table as A.
SELECT A.*,B.OPRDEFNDESC FROM
( select OPRID_ENTERED_BY ,COUNT(*)
from ps_req_hdr
where entered_dt > '01-JUL-2012'
GROUP BY OPRID_ENTERED_BY
ORDER BY COUNT(*) DESC) A, PSOPRDEFN B
WHERE A.OPRID_ENTERED_BY=B.OPRID
You may be able to use a simple INNER JOIN to do the same thing...
SELECT A.OPRID_ENTERED_BY, COUNT(*), B.OPRDEFNDESC
FROM ps_req_hdr A
JOIN PSOPRDEFN B ON A.OPRID_ENTERED_BY = B.OPRID
WHERE A.entered_dt > '01-JUL-2012'
GROUP BY A.OPRID_ENTERED_BY, B.OPRDEFNDESC
ORDER BY COUNT(*) DESC
NOTE
As per the comments below, the COUNT(*) result for this query will NOT include records that don't have corresponding matches in table B, and it will inflate for non-unique matches in table B. What this means is: if B.OPRID is not a unique field or if A.OPRID_ENTERED_BY is not a foreign key for B.OPRID then this answer will not yield the same results as the original query.

Update using Distinct SUM

I have found a few good resources that show I should be able to merge a select query with an update, but I just can't get my head around of the correct formatting.
I have a select statement that is getting info for me, and I want to pretty much use those results to Update an account table that matches the accountID in the select query.
Here is the select statement:
SELECT DISTINCT SUM(b.workers)*tt.mealTax as MealCost,b.townID,b.accountID
FROM buildings AS b
INNER JOIN town_tax AS tt ON tt.townID = b.townID
GROUP BY b.townID,b.accountID
So in short I want the above query to be merged with:
UPDATE accounts AS a
SET a.wealth = a.wealth - MealCost
Where MealCost is the result from the select query. I am sure there is a way to put this into one, I just haven't quite been able to connect the dots to get it to run consistently without separating into two queries.
First, you don't need the distinct when you have a group by.
Second, how do you intend to link the two results? The SELECT query is returning multiple rows per account (one for each town). Presumably, the accounts table has only one row. Let's say that you wanted the average MealCost for the update.
The select query to get this is:
SELECT accountID, avg(MealCost) as avg_Mealcost
FROM (SELECT SUM(b.workers)*tt.mealTax as MealCost, b.townID, b.accountID
FROM buildings AS b INNER JOIN
town_tax AS tt
ON tt.townID = b.townID
GROUP BY b.townID,b.accountID
) a
GROUP BY accountID
Now, to put this into an update, you can use syntax like the following:
UPDATE accounts
set accounts.wealth = accounts.wealth + asum.avg_mealcost
from (SELECT accountID, avg(MealCost) as avg_Mealcost
FROM (SELECT SUM(b.workers)*tt.mealTax as MealCost, b.townID, b.accountID
FROM buildings AS b INNER JOIN
town_tax AS tt
ON tt.townID = b.townID
GROUP BY b.townID,b.accountID
) a
GROUP BY accountID
) asum
where accounts.accountid = asum.accountid
This uses SQL Server syntax, which I believe is the same as for Oracle and most other databases. Mysql puts the "from" clause before the "set" and allows an alias on "update accounts".