SUM a column count from two tables - sql

I have this simple unioned query in SQL Server 2014 where I am getting counts of rows from each table, and then trying to add a TOTAL row at the bottom that will SUM the counts from both tables. I believe the problem is the LEFT OUTER JOIN on the last union seems to be only summing the totals from the first table
SELECT A.TEST_CODE, B.DIVISION, COUNT(*)
FROM ALL_USERS B, SIGMA_TEST A
WHERE B.DOMID = A.DOMID
GROUP BY A.TEST_CODE, B.DIVISION
UNION
SELECT E.TEST_CODE, F.DIVISION, COUNT(*)
FROM BETA_TEST E, ALL_USERS F
WHERE E.DOMID = F.DOMID
GROUP BY E.TEST_CODE, F.DIVISION
UNION
SELECT 'TOTAL', '', COUNT(*)
FROM (SIGMA_TEST A LEFT OUTER JOIN BETA_TEST E ON A.DOMID
= E.DOMID )
Here is a sample of the results I am getting:
I would expect the TOTAL row to display a result of 6 (2+1+3=6)
I would like to avoid using a Common Table Expression (CTE) if possible. Thanks in advance!

Since you are counting users with matching DOMIDs in the first two statements, the final statement also needs to include the ALL_USERS table. The final statement should be:
SELECT 'TOTAL', '', COUNT(*)
FROM ALL_USERS G LEFT OUTER JOIN
SIGMA_TEST H ON G.DOMID = H.DOMID
LEFT OUTER JOIN BETA_TEST I ON I.DOMID = G.DOMID
WHERE (H.TEST_CODE IS NOT NULL OR I.TEST_CODE IS NOT NULL)

I would consider doing a UNION ALL first then COUNT:
SELECT COALESCE(TEST_CODE, 'TOTAL'),
DIVISION,
COUNT(*)
FROM (
SELECT A.TEST_CODE, B.DIVISION
FROM ALL_USERS B
INNER JOIN SIGMA_TEST A ON B.DOMID = A.DOMID
UNION ALL
SELECT E.TEST_CODE, F.DIVISION
FROM BETA_TEST E
INNER JOIN ALL_USERS F ON E.DOMID = F.DOMID ) AS T
GROUP BY GROUPING SETS ((TEST_CODE, DIVISION ), ())
Using GROUPING SETS you can easily get the total, so there is no need to add a third subquery.
Note: I assume you want just one count per (TEST_CODE, DIVISION). Otherwise you have to also group on the source table as well, as in #Gareth's answer.

I think you can achieve this with a single query. It seems your test tables have similar structures, so you can union them together and join to ALL_USERS, finally, you can use GROUPING SETS to get the total
SELECT ISNULL(T.TEST_CODE, 'TOTAL') AS TEST_CODE,
ISNULL(U.DIVISION, '') AS DIVISION,
COUNT(*)
FROM ALL_USERS AS U
INNER JOIN
( SELECT DOMID, TEST_CODE, 'SIGNMA' AS SOURCETABLE
FROM SIGMA_TEST
UNION ALL
SELECT DOMID, TEST_CODE, 'BETA' AS SOURCETABLE
FROM BETA_TEST
) AS T
ON T.DOMID = U.DOMID
GROUP BY GROUPING SETS ((T.TEST_CODE, U.DIVISION, T.SOURCETABLE), ());
As an aside, the implicit join syntax you are using was replaced over a quarter of a century ago in ANSI 92. It is not wrong, but there seems to be little reason to continue to use it, especially when you are mixing and matching with explicit outer joins and implicit inner joins. Anyone else that might read your SQL will certainly appreciate consistency.

Related

How to get a result set containing the absence of a value?

Scenario: Have a table with four columns. District_Number, District_name, Data_Collection_Week, enrollments. Each week we get data, BUT sometimes we do not.
Task: My supervisor wants me to produce a query that will let us know, which districts did not submit a given week.
What I have tried is below, but I cannot get a NULL value on those that did not submit a week.
SELECT DISTINCT DistrictNumber, DistrictName, DataCollectionWeek
into #test4
FROM EDW_REQUESTS.INSTRUCTION_DELIVERY_ENROLLMENT_2021
order by DistrictNumber, DataCollectionWeek asc
select DISTINCT DataCollectionWeek
into #test5
from EDW_REQUESTS.INSTRUCTION_DELIVERY_ENROLLMENT_2021
order by DataCollectionWeek
select b.DistrictNumber, b.DistrictName, b.DataCollectionWeek
from #test5 a left outer join #test4 b on (a.DataCollectionWeek = b.DataCollectionWeek)
order by b.DistrictNumber, b.DataCollectionWeek asc
One option uses a cross join of two select distinct subqueries to generate all possible combinations of districts and weeks, and then not exists to identify those that are not available in the table:
select d.districtnumber, w.datacollectionweek
from (select distinct districtnumber from edw_requests.instruction_delivery_enrollment_2021) d
cross join (select distinct datacollectionweek from edw_requests.instruction_delivery_enrollment_2021) w
where not exists (
select 1
from edw_requests.instruction_delivery_enrollment_2021 i
where i.districtnumber = d.districtnumber and i.datacollectionweek = w.datacollectionweek
)
This would be simpler (and much more efficient) if you had referential tables to store the districts and weeks: you would then use them directly instead of the select distinct subqueries.

Query all columns of table1 left join and count of the table2

I couldn't get this query working :
DOESN'T WORK
select
Region.*, count(secteur.*) count
from
Region
left join
secteur on secteur.region_id = Region.id
The solution I found is this but is there a better solution using joins or if this doesn't affect performance, because I have a very large dataset of about 500K rows
WORKS BUT AFRAID OF PERFORMANCE ISSUES
select
Region.*,
(select count(*)
from Secteur
where Secteur.Region_id = region.id) count
from
Region
I would suggest:
select region.*, count(secteur.region_id) as count
from region left join secteur on region.id = secteur.region_id
group by region.id, region.field2, region.field3....
Note that count(table.field) will ignore nulls, whereas count(*) will include them.
Alternatively, left join on a subquery and use coalesce to avoid nulls:
select region.*, coalesce(t.c, 0) as count
from region left join
(select region_id, count(*) as c from secteur group by region_id) t on region.id = t.region_id
I'd join region on an aggregate query of secteur:
SELECT r.*, COALESCE(s.cnt, 0)
FROM region r
LEFT JOIN (SELECT region_id, COUNT(*) AS cnt
FROM secteur
GROUP BY region_id) s ON s.region_id = r.id
I would go with this query:
select r.*,
(select count(*)
from Secteur s
where s.Region_id = r.id
) as num_secteurs
from Region r;
Then fix the performance problem by adding an index on Secteur(region_id):
create index idx_secteur_region on secteur(region_id);
You make a two mistakes
First: you have try to calulate COUNT() in only one (I mean, the second) table. This doesn't will work because theCOUNT(), like an any aggregate function, calculates only for the whole set of rows, not just for any part of the set (not only just for the one or an other joined table).
In your first query, you may replace secteur. * only by asterisk, like a Region.region_id, count(*) AS count, and do not forget add Region.region_id on the GROUP BY step.
Second: You has define not only aggregate function in the query, but and other fields: select Region.*, but you don't define them in GROUP BY step. You need to add to GROUP BY statement all columns, which you has define in the SELECT step but not apply an aggregate functions to them.
Append: not, GROUP BY Region.* doesn't will work, you should to define a columns in the GROUP BY step by their actual names.
So, correct form of this will looks like a
SELECT
Region.col1
,Region.col2,
, count(*) count
from Region
left join
secteur on secteur.region_id = Region.id
GROUP BY Region.col1, Region.col2
Or, if you don't want to type each name of column, use window queries
SELECT
Region.*,
, count( * ) OVER (PARTITION BY region_id) AS count
from Region
left join
secteur on secteur.region_id = Region.id

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

Querying a SQL Query for Distinct Records

I have a SQL Query that joins 3 tables and pulls 3 total columns out of them. I was try to figure out how to Query that Query to get distinct records from only one of the columns. Here is what I have so far
Select Distinct Make.NAME
From
(
Select MakeModel.MAKE_ID, Make.NAME, Vehicle.MODEL_YR
From NagsInfo.dbo.Make
INNER JOIN NagsInfo.dbo.MakeModel
on Make.MAKE_ID = MakeModel.MAKE_ID
INNER JOIN NagsInfo.dbo.Vehicle
on MakeModel.MAKE_MODEL_ID = Vehicle.MAKE_MODEL_ID
Where Vehicle.MODEL_YR = #YEAR
)
I keep getting multiple different syntax errors, I believe the most recent one telling me that the parentheses were incorrect, but everywhere I looked they are required for Sub queries.
Why use a subquery at all? Why not just do:
SELECT DISTINCT M.NAME
FROM NagsInfo.dbo.Make M
JOIN NagsInfo.dbo.MakeModel MM ON M.MAKE_ID = MM.MAKE_ID
JOIN NagsInfo.dbo.Vehicle V ON MM.MAKE_MODEL_ID = V.MAKE_MODEL_ID
WHERE V.MODEL_YR = #YEAR

Getting difference of two counts in SQL

I'm doing some QA in Netezza and I need to compare the counts from two separate SQL statements. This is the SQL that I am currently using
SELECT COUNT(*) AS RECORD_COUNT
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,mid_key from db..F_EMAIL) B
ON A.MID_KEY=B.MID_KEY
MINUS
SELECT COUNT(*)
FROM db..EXT_ACXIOM_WUL_FILE A
However, it seems like MINUS doesn't work like that. When the counts match, instead of returning 0, this will return null for Record_count. I basically the record count to be computed as:
record_count=count1-count2
So it is 0 if the counts are equal or the difference otherwise. What is the correct SQL for this?
SELECT
(
SELECT COUNT(*) AS RECORD_COUNT
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,mid_key from db..F_EMAIL) B
ON A.MID_KEY=B.MID_KEY
) -
(
SELECT COUNT(*)
FROM db..EXT_ACXIOM_WUL_FILE A
) TotalCount
Oracle's MINUS (EXCEPT in SQL Server) is a whole different animal :)
If you understand UNION and then think sets, you will understand MINUS / EXCEPT
MINUS is set difference, not for arithmetic operations.
You could do
SELECT COUNT(*) - (SELECT COUNT(*)
FROM db..EXT_ACXIOM_WUL_FILE A) AS Val
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,
mid_key
from db..F_EMAIL) B
ON A.MID_KEY = B.MID_KEY
Or another option
SELECT COUNT(*) - COUNT(DISTINCT A.PrimaryKey) AS Val
FROM db..EXT_ACXIOM_WUL_FILE A
LEFT JOIN (select distinct CURRENTLY_OPTED_IN_FL,
mid_key
from db..F_EMAIL) B
ON A.MID_KEY = B.MID_KEY
I think this may be what you are looking for
SELECT COUNT(distinct(CURRENTLY_OPTED_IN_FL + F_EMAIL.MID_KEY)) - count(distinct(EXT_ACXIOM_WUL_FILE.MID_KEY))
FROM EXT_ACXIOM_WUL_FILE
LEFT OUTER JOIN F_EMAIL
ON JOIN F_EMAIL.MID_KEY = EXT_ACXIOM_WUL_FILE.MID_KEY