Getting difference of two counts in SQL - sql

I'm doing some QA in Netezza and I need to compare the counts from two separate SQL statements. This is the SQL that I am currently using
SELECT COUNT(*) AS RECORD_COUNT
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,mid_key from db..F_EMAIL) B
ON A.MID_KEY=B.MID_KEY
MINUS
SELECT COUNT(*)
FROM db..EXT_ACXIOM_WUL_FILE A
However, it seems like MINUS doesn't work like that. When the counts match, instead of returning 0, this will return null for Record_count. I basically the record count to be computed as:
record_count=count1-count2
So it is 0 if the counts are equal or the difference otherwise. What is the correct SQL for this?

SELECT
(
SELECT COUNT(*) AS RECORD_COUNT
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,mid_key from db..F_EMAIL) B
ON A.MID_KEY=B.MID_KEY
) -
(
SELECT COUNT(*)
FROM db..EXT_ACXIOM_WUL_FILE A
) TotalCount
Oracle's MINUS (EXCEPT in SQL Server) is a whole different animal :)
If you understand UNION and then think sets, you will understand MINUS / EXCEPT

MINUS is set difference, not for arithmetic operations.
You could do
SELECT COUNT(*) - (SELECT COUNT(*)
FROM db..EXT_ACXIOM_WUL_FILE A) AS Val
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,
mid_key
from db..F_EMAIL) B
ON A.MID_KEY = B.MID_KEY
Or another option
SELECT COUNT(*) - COUNT(DISTINCT A.PrimaryKey) AS Val
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,
mid_key
from db..F_EMAIL) B
ON A.MID_KEY = B.MID_KEY

I think this may be what you are looking for
SELECT COUNT(distinct(CURRENTLY_OPTED_IN_FL + F_EMAIL.MID_KEY)) - count(distinct(EXT_ACXIOM_WUL_FILE.MID_KEY))
FROM EXT_ACXIOM_WUL_FILE
LEFT OUTER JOIN F_EMAIL
ON JOIN F_EMAIL.MID_KEY = EXT_ACXIOM_WUL_FILE.MID_KEY

Related

Query all columns of table1 left join and count of the table2

I couldn't get this query working :
DOESN'T WORK
select
Region.*, count(secteur.*) count
from
Region
left join
secteur on secteur.region_id = Region.id
The solution I found is this but is there a better solution using joins or if this doesn't affect performance, because I have a very large dataset of about 500K rows
WORKS BUT AFRAID OF PERFORMANCE ISSUES
select
Region.*,
(select count(*)
from Secteur
where Secteur.Region_id = region.id) count
from
Region
I would suggest:
select region.*, count(secteur.region_id) as count
from region left join secteur on region.id = secteur.region_id
group by region.id, region.field2, region.field3....
Note that count(table.field) will ignore nulls, whereas count(*) will include them.
Alternatively, left join on a subquery and use coalesce to avoid nulls:
select region.*, coalesce(t.c, 0) as count
from region left join
(select region_id, count(*) as c from secteur group by region_id) t on region.id = t.region_id
I'd join region on an aggregate query of secteur:
SELECT r.*, COALESCE(s.cnt, 0)
FROM region r
LEFT JOIN (SELECT region_id, COUNT(*) AS cnt
FROM secteur
GROUP BY region_id) s ON s.region_id = r.id
I would go with this query:
select r.*,
(select count(*)
from Secteur s
where s.Region_id = r.id
) as num_secteurs
from Region r;
Then fix the performance problem by adding an index on Secteur(region_id):
create index idx_secteur_region on secteur(region_id);
You make a two mistakes
First: you have try to calulate COUNT() in only one (I mean, the second) table. This doesn't will work because theCOUNT(), like an any aggregate function, calculates only for the whole set of rows, not just for any part of the set (not only just for the one or an other joined table).
In your first query, you may replace secteur. * only by asterisk, like a Region.region_id, count(*) AS count, and do not forget add Region.region_id on the GROUP BY step.
Second: You has define not only aggregate function in the query, but and other fields: select Region.*, but you don't define them in GROUP BY step. You need to add to GROUP BY statement all columns, which you has define in the SELECT step but not apply an aggregate functions to them.
Append: not, GROUP BY Region.* doesn't will work, you should to define a columns in the GROUP BY step by their actual names.
So, correct form of this will looks like a
SELECT
Region.col1
,Region.col2,
, count(*) count
from Region
left join
secteur on secteur.region_id = Region.id
GROUP BY Region.col1, Region.col2
Or, if you don't want to type each name of column, use window queries
SELECT
Region.*,
, count( * ) OVER (PARTITION BY region_id) AS count
from Region
left join
secteur on secteur.region_id = Region.id

SUM a column count from two tables

I have this simple unioned query in SQL Server 2014 where I am getting counts of rows from each table, and then trying to add a TOTAL row at the bottom that will SUM the counts from both tables. I believe the problem is the LEFT OUTER JOIN on the last union seems to be only summing the totals from the first table
SELECT A.TEST_CODE, B.DIVISION, COUNT(*)
FROM ALL_USERS B, SIGMA_TEST A
WHERE B.DOMID = A.DOMID
GROUP BY A.TEST_CODE, B.DIVISION
UNION
SELECT E.TEST_CODE, F.DIVISION, COUNT(*)
FROM BETA_TEST E, ALL_USERS F
WHERE E.DOMID = F.DOMID
GROUP BY E.TEST_CODE, F.DIVISION
UNION
SELECT 'TOTAL', '', COUNT(*)
FROM (SIGMA_TEST A LEFT OUTER JOIN BETA_TEST E ON A.DOMID
= E.DOMID )
Here is a sample of the results I am getting:
I would expect the TOTAL row to display a result of 6 (2+1+3=6)
I would like to avoid using a Common Table Expression (CTE) if possible. Thanks in advance!
Since you are counting users with matching DOMIDs in the first two statements, the final statement also needs to include the ALL_USERS table. The final statement should be:
SELECT 'TOTAL', '', COUNT(*)
FROM ALL_USERS G LEFT OUTER JOIN
SIGMA_TEST H ON G.DOMID = H.DOMID
LEFT OUTER JOIN BETA_TEST I ON I.DOMID = G.DOMID
WHERE (H.TEST_CODE IS NOT NULL OR I.TEST_CODE IS NOT NULL)
I would consider doing a UNION ALL first then COUNT:
SELECT COALESCE(TEST_CODE, 'TOTAL'),
DIVISION,
COUNT(*)
FROM (
SELECT A.TEST_CODE, B.DIVISION
FROM ALL_USERS B
INNER JOIN SIGMA_TEST A ON B.DOMID = A.DOMID
UNION ALL
SELECT E.TEST_CODE, F.DIVISION
FROM BETA_TEST E
INNER JOIN ALL_USERS F ON E.DOMID = F.DOMID ) AS T
GROUP BY GROUPING SETS ((TEST_CODE, DIVISION ), ())
Using GROUPING SETS you can easily get the total, so there is no need to add a third subquery.
Note: I assume you want just one count per (TEST_CODE, DIVISION). Otherwise you have to also group on the source table as well, as in #Gareth's answer.
I think you can achieve this with a single query. It seems your test tables have similar structures, so you can union them together and join to ALL_USERS, finally, you can use GROUPING SETS to get the total
SELECT ISNULL(T.TEST_CODE, 'TOTAL') AS TEST_CODE,
ISNULL(U.DIVISION, '') AS DIVISION,
COUNT(*)
FROM ALL_USERS AS U
INNER JOIN
( SELECT DOMID, TEST_CODE, 'SIGNMA' AS SOURCETABLE
FROM SIGMA_TEST
UNION ALL
SELECT DOMID, TEST_CODE, 'BETA' AS SOURCETABLE
FROM BETA_TEST
) AS T
ON T.DOMID = U.DOMID
GROUP BY GROUPING SETS ((T.TEST_CODE, U.DIVISION, T.SOURCETABLE), ());
As an aside, the implicit join syntax you are using was replaced over a quarter of a century ago in ANSI 92. It is not wrong, but there seems to be little reason to continue to use it, especially when you are mixing and matching with explicit outer joins and implicit inner joins. Anyone else that might read your SQL will certainly appreciate consistency.

SQL join count and select query

I have two tables, one is a list of 'gangs' and one is a list of 'gang_members' the gang_members.gang_id refers to the gang.id they are in, I know how to count all the members in one gang, but I need to join the following queries into one:
SELECT * FROM gangs LIMIT 8
SELECT count(gang_id) FROM gangs_members WHERE gang_id = <GANG ID>
I think this is possible, I could do it in a loop while it's going through the gangs but that would be inefficient
SELECT A.*, B.RC
FROM gangs A
LEFT JOIN (SELECT gang_id, COUNT(*) AS RC FROM gangs_members GROUP BY gang_id) B ON A.gang_id=B.gang_id
Probably something like this
SELECT count(gang_id)
FROM gangs_members
WHERE gang_id IN (SELECT gang_id FROM gangs LIMIT 8)

Movie and Year exist in one query but not both

I have two queries, the first one returns a movie and year which has movies which has more then two cast members and the second query displays the movies which have won more than two awards.
So I want to write a query which will give me the movie and year which occurs in one query but not both. How will I able to do this?
The syntax is in Oracle.
We can do this MINUS
First set is rows that exists in table1 alone
Second set is rows that exists only on table2
SELECT * FROM table1
MINUS
SELECT * FROM table2
UNION
SELECT * FROM table2
MINUS
SELECT * FROM table1
I want to write a query which will give me the movie and year which
occurs in one query but not both.
To do this you need to do UNION of both the queries and INTERCEPT of both the queries AND MINUS the INTERCEPT from the UNION. Like this
((SELECT T2.movie_title,T2.release_year
FROM(SELECT b.movie_title,b.release_year, COUNT(b.movie_title) as NUMMOVIES
FROM ACTOR a FULL OUTER JOIN CAST_MEMBER b ON a.actor_name=b.actor_name
WHERE EXISTS(SELECT c.actor_name FROM CAST_MEMBER c WHERE c.actor_name=a.actor_name)
GROUP BY b.movie_title,b.release_year) T2
WHERE T2.NUMMOVIES > 2)
UNION
(SELECT a.movie_title,a.release_year
FROM MOVIE a
WHERE (SELECT COUNT(b.won) as Won_Counter
FROM NOMINATION b
WHERE b.movie_title=a.movie_title AND a.release_year=b.release_year AND b.won ='Yes') > 2))
MINUS
((SELECT T2.movie_title,T2.release_year
FROM(SELECT b.movie_title,b.release_year, COUNT(b.movie_title) as NUMMOVIES
FROM ACTOR a FULL OUTER JOIN CAST_MEMBER b ON a.actor_name=b.actor_name
WHERE EXISTS(SELECT c.actor_name FROM CAST_MEMBER c WHERE c.actor_name=a.actor_name)
GROUP BY b.movie_title,b.release_year) T2
WHERE T2.NUMMOVIES > 2)
INTERSECT
(SELECT a.movie_title,a.release_year
FROM MOVIE a
WHERE (SELECT COUNT(b.won) as Won_Counter
FROM NOMINATION b
WHERE b.movie_title=a.movie_title AND a.release_year=b.release_year AND b.won ='Yes') > 2))
Learn more about these operators here
I am sure there is a much better way to do this but we will need more information about your tables
You can do this in several ways. Here is a way that doesn't use minus:
with q1 as (
<first query here>
),
q2 as (
<second query here>
)
select q1.*
from q1
where not exists (select 1 from q2 where q2.movie = q1.movie);
This assumes that you want movies in the first query that are not in the second. It also assumes that the second does not return a year; otherwise that would be part of the where condition.

Only one expression can be specified in the select list when the subquery is not introduced with EXISTS

My query is as follows, and contains a subquery within it:
select count(distinct dNum)
from myDB.dbo.AQ
where A_ID in
(SELECT DISTINCT TOP (0.1) PERCENT A_ID,
COUNT(DISTINCT dNum) AS ud
FROM myDB.dbo.AQ
WHERE M > 1 and B = 0
GROUP BY A_ID ORDER BY ud DESC)
The error I am receiving is ...
Only one expression can be specified in the select list when the subquery is not
introduced with EXISTS.`
When I run the sub-query alone, it returns just fine, so I am assuming there is some issue with the main query?
You can't return two (or multiple) columns in your subquery to do the comparison in the WHERE A_ID IN (subquery) clause - which column is it supposed to compare A_ID to? Your subquery must only return the one column needed for the comparison to the column on the other side of the IN. So the query needs to be of the form:
SELECT * From ThisTable WHERE ThisColumn IN (SELECT ThatColumn FROM ThatTable)
You also want to add sorting so you can select just from the top rows, but you don't need to return the COUNT as a column in order to do your sort; sorting in the ORDER clause is independent of the columns returned by the query.
Try something like this:
select count(distinct dNum)
from myDB.dbo.AQ
where A_ID in
(SELECT DISTINCT TOP (0.1) PERCENT A_ID
FROM myDB.dbo.AQ
WHERE M > 1 and B = 0
GROUP BY A_ID
ORDER BY COUNT(DISTINCT dNum) DESC)
You should return only one column and one row in the where query where you assign the returned value to a variable. Example:
select * from table1 where Date in (select * from Dates) -- Wrong
select * from table1 where Date in (select Column1,Column2 from Dates) -- Wrong
select * from table1 where Date in (select Column1 from Dates) -- OK
It's complaining about
COUNT(DISTINCT dNum) AS ud
inside the subquery. Only one column can be returned from the subquery unless you are performing an exists query. I'm not sure why you want to do a count on the same column twice, superficially it looks redundant to what you are doing. The subquery here is only a filter it is not the same as a join. i.e. you use it to restrict data, not to specify what columns to get back.
Apart from very good responses here, you could try this as well if you want to use your sub query as is.
Approach:
1) Select the desired column (Only 1) from your sub query
2) Use where to map the column name
Code:
SELECT count(distinct dNum)
FROM myDB.dbo.AQ
WHERE A_ID in
(
SELECT A_ID
FROM (SELECT DISTINCT TOP (0.1) PERCENT A_ID, COUNT(DISTINCT dNum) AS ud
FROM myDB.dbo.AQ
WHERE M > 1 and B = 0
GROUP BY A_ID ORDER BY ud DESC
) a
)
Just in case it helps someone, here's what caused this error for me:
I needed a procedure to return json but I left out the for json path:
set #jsonout = (SELECT ID, SumLev, Census_GEOID, AreaName, Worksite
from CS_GEO G (nolock)
join #allids a on g.ID = a.[value]
where g.Worksite = #worksite)
When I tried to save the stored procedure, it threw the error. I fixed it by adding for json path to the code at the end of the procedure:
set #jsonout = (SELECT ID, SumLev, Census_GEOID, AreaName, Worksite
from CS_GEO G (nolock)
join #allids a on g.ID = a.[value]
where g.Worksite = #worksite for json path)
For projection in subquery, you can use
SELECT t.col1,t.col2
FROM table1 t
WHERE EXISTS (SELECT st.col1,st.col2
FROM table2 st
WHERE st.fcol = t.fcol)