Count the overlapping values between two tables? - sql

I have two tables that are structured the same with a sequence column and I am trying to count the number of sequences that show up in two different tables.
I am using this right now:
SELECT A.sequence FROM p2.pool A WHERE EXISTS (SELECT * from
p1.pool B WHERE B.sequence = A.sequence)
And then I was going to count the number of results.
Is there an easier way to do this using COUNT so I don't have to get all of the results first?

Yes, there is an easier way using COUNT:
SELECT COUNT(*)
FROM p2.pool A
WHERE EXISTS (SELECT *
FROM p1.pool B
WHERE B.sequence = A.sequence)
You could also use a join instead of a subquery, but the speed is unlikely to change:
SELECT COUNT(*)
FROM p2.pool A
JOIN p1.pool B ON A.sequence = B.sequence

Related

SQL: how to count entries in multiple tables by value?

I have two tables, question & field. I need to count entries , with coincidental value of template_id (both tables contains).
Please advice, how to do it?
select count(q.*)
from question q
left join field f on f.template.id = q.template_id
In StackOverflow one should show ones own attempt, show that some effort was done.
Above inner join is probably what you meant. Try first select q.*, f.*.
SELECT
COUNT(*) AS TotalRecords
FROM question q
INNER JOIN field f ON f.template_id = q.template_id
If you want the count of distinct template_id in the two tables, use JOIN and COUNT(DISTINCT):
select count(distinct q.template_id)
from question q join
field f
on f.template_id = q.template_id;
If you use count(*) you will get a count of matching rows, not template_ids, so duplicates will affect the result.
If template_id is known to be unique in one of the tables (say question), then exists is probably more efficient:
select count(*)
from question q
where exists (select 1
from field f
where f.template_id = q.template_id
);

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

Difference between Two Queries - Join vs IN

I have the following two queries. Query1 is returning 1000 as row count where as Query2 is returning 4000 as row count. Can someone please explain the difference between both the queries. I was hoping both would return same count.
Query1:
SELECT COUNT(*)
FROM TableA A
WHERE A.VIN IN (
SELECT VIN
FROM TableB B, TableC C
WHERE B.MODEL_YEAR = '2014' AND B.VIN_NBR = C.VIN
)
Query2:
SELECT COUNT(*)
FROM TABLEA A, TableB B, TableC C
WHERE B.MODEL_YEAR = '2014' AND B.VIN_NBR = C.VIN AND A.VIN = C.VIN
In many cases, they will return the same answer, but not necessarily. The first counts the number of rows in A that match the conditions -- each row is counted only once, regardless of the number of matches. The second does a join, which can multiply the number of rows.
The second query would be equivalent in results if it used count(distinct A.id), where id is unique or a primary key.
That said, although they are similar in functionality, how they are executed can be quite different. Different SQL engines might do a better job of optimizing one version or the other.
By the way, you should avoid the archaic join syntax that you are using. Since 1992, explicit joins have been part of SQL syntax.

Self Join bringing too many records

I have this query to express a set of business rules.
To get the information I need, I tried joining the table on itself but that brings back many more records than are actually in the table. Below is the query I've tried. What am I doing wrong?
SELECT DISTINCT a.rep_id, a.rep_name, count(*) AS 'Single Practitioner'
FROM [SE_Violation_Detection] a inner join [SE_Violation_Detection] b
ON a.rep_id = b.rep_id and a.hcp_cid = b.hcp_cid
group by a.rep_id, a.rep_name
having count(*) >= 2
You can accomplish this with the having clause:
select a, b, count(*) c
from etc
group by a, b
having count(*) >= some number
I figured out a simpler way to get the information I need for one of the queries. The one above is still wrong.
--Rep violation for different HCP more than 5 times
select distinct rep_id,rep_name,count(distinct hcp_cid)
AS 'Multiple Practitioners'
from dbo.SE_Violation_Detection
group by rep_id,rep_name
having count(distinct hcp_cid)>4
order by count(distinct hcp_cid)

SQL, only if matching all foreign key values to return the record?

I have two tables
Table A
type_uid, allowed_type_uid
9,1
9,2
9,4
1,1
1,2
24,1
25,3
Table B
type_uid
1
2
From table A I need to return
9
1
Using a WHERE IN clause I can return
9
1
24
SELECT
TableA.type_uid
FROM
TableA
INNER JOIN
TableB
ON TableA.allowed_type_uid = TableB.type_uid
GROUP BY
TableA.type_uid
HAVING
COUNT(distinct TableB.type_uid) = (SELECT COUNT(distinct type_uid) FROM TableB)
Join the two tables togeter, so that you only have the records matching the types you are interested in.
Group the result set by TableA.type_uid.
Check that each group has the same number of allowed_type_uid values as exist in TableB.type_uid.
distinct is required only if there can be duplicate records in either table. If both tables are know to only have unique values, the distinct can be removed.
It should also be noted that as TableA grows in size, this type of query will quickly degrade in performance. This is because indexes are not actually much help here.
It can still be a useful structure, but not one where I'd recommend running the queries in real-time. Rather use it to create another persisted/cached result set, and use this only to refresh those results as/when needed.
Or a slightly cheaper version (resource wise):
SELECT
Data.type_uid
FROM
A AS Data
CROSS JOIN
B
LEFT JOIN
A
ON Data.type_uid = A.type_uid AND B.type_uid = A.allowed_type_uid
GROUP BY
Data.type_uid
HAVING
MIN(ISNULL(A.allowed_type_uid,-999)) != -999
Your explanation is not very clear. I think you want to get those type_uid's from table A where for all records in table B there is a matching A.Allowed_type_uid.
SELECT T2.type_uid
FROM (SELECT COUNT(*) as AllAllowedTypes FROM #B) as T1,
(SELECT #A.type_uid, COUNT(*) as AllowedTypes
FROM #A
INNER JOIN #B ON
#A.allowed_type_uid = #B.type_uid
GROUP BY #A.type_uid
) as T2
WHERE T1.AllAllowedTypes = T2.AllowedTypes
(Dems, you were faster than me :) )