Return null values in GROUP BY query - sql

This only returns results for entries that exist, which makes sense, but I'm trying to get it to display all of the B.[HostName] entries, even if there aren't entries for all of them. I'd like it to show 0 under the count for those. I've read about needing to use a LEFT JOIN on some of the tables and changing the COUNT(*) to use a field instead, but whenever I have tried that, it still results in the same data. Can anyone show me what would need to be changed for this to work the way I mentioned?
EDIT:
To clarify, since the initial answers didn't work:
The count would be coming from this table: [Database].[dbo].[WorkItemHistory] A -- we want all of the entries of A.[PlatformId]
A.[PlatformId] is a foreign key to C.[Id]
AND C.[EngineId] is a foreign key to B.[Id]
From there, we are getting B.[HostName]
So, all entries for B.[HostName] would be listed in the output and the count would come from the entries in the A.[WorkitemHistory] table.
We are using
SELECT DISTINCT B.[HostName], COUNT(*) AS Count
FROM [Database].[dbo].[WorkItemHistory] A, [Database].[dbo].[Engine] B, [Database].[dbo].[Platforms] C
WHERE A.[PlatformId] = C.[Id]
AND B.[Id] = C.[EngineId]
AND A.[Status] = '30'
AND A.[LastAttemptDateTime] >= CAST(GETDATE() AS Date)
GROUP BY B.[HostName]
ORDER BY COUNT(*) DESC

Replace your JOIN with a RIGHT JOIN B like below
to give values of B that doesnt have any common values of B in others.
SELECT B.[HostName], COUNT(*) AS Count
FROM [Database].[dbo].[WorkItemHistory] A
JOIN [Database].[dbo].[Platforms] C
On A.[PlatformId] = C.[Id]
RIGHT JOIN [Database].[dbo].[Engine] B
On C.[id]=B.[EngineId]
Where A.[Status] = '30'
AND A.[LastAttemptDateTime] >= CAST(GETDATE() AS Date)
GROUP BY B.[HostName]
ORDER BY COUNT(*) DESC

I think Something like this will do it. You just need to left join the other tables to [Database].[dbo].[Engine] since you want all the hostnames.
SELECT B.[HostName],
COUNT(C.ID) AS Count
FROM [Database].[dbo].[Engine] B
LEFT JOIN [Database].[dbo].[Platforms] C
ON B.[Id] = C.[EngineId]
LEFT JOIN [Database].[dbo].[WorkItemHistory] A
ON A.[PlatformId] = C.[Id]
AND A.[Status] = '30'
AND AND A.[LastAttemptDateTime] >= CAST(GETDATE() AS Date)
GROUP BY B.[HostName]
ORDER BY COUNT(*) DESC

Related

Why is the SQL full outer join is not presenting unmatched customers (avc_id)?

I appreciate your help in advance!
The right table avc_enr has 108K customers (b.avc_id) in it. In the 2nd table (alias a), we have about 97K customers (a.avc_id).
I tried to use right, left and full outer join but every time the count of customers shows 97K rather than 108K customers (under Total_users)... any idea why with full outer join the count function is not counting all customers even if no common match is found between two tables?
with avc_enr as
(
select
dt, avc_id, service_template_name
from
hive.thor_satellite.v_nms_inventory_nmsdb_avc_service
where
current_status = 'ACTIVE' and dt = 20220809
)
select
a.dt, a.metrics_date,
avg(a.vsat_fl_byte_count_kbps) as AUPU_Kbps,
count(b.avc_id) as Total_users
from
hive.thor_satellite.vda_satellite_nms_performance_smts_avc_pm_throughput a
full outer join
avc_enr b on a.avc_id = b.avc_id and a.dt = b.dt
where
a.dt = 20220809
group by
a.dt, a.metrics_date

DimDate Join to Another Table

I have a DimDate table that I want to join to another table with dates where various visits took place. I have followed many threads on here, but I can't get the missing dates to appear in my result set. Attached is a screenshot of my DimDate table and below is the script. I have two versions with the Dimdate table being the main table and the other version with it as a left join to the other table, and neither bring through the missing dates in my result set. Basically, I am trying to bring through all months and populate with a NULL if there are no entries in my other table.
SELECT
month(d.date) as 'DimDateMonth'
,month(s.date) as 'ActivityMonth'
,year(d.date) as 'DimDateYear'
,[PCN]
,[Type of Visit]
,[Pharmacist]
,[Practice]
,count(distinct(cast(s.Date as date))) 'Number of visits'
FROM [dbo].[DimDate] as d
left join [dbo].[mytable] s on month(d.date) = month(s.date) and year(s.date) = year(d.date)
where s.Pharmacist = 'abc' and year(d.date) = '2020'
group by
month(d.date)
,month(s.date)
,year(d.date)
,[PCN]
,[Type of Visit]
,[Pharmacist]
,[Practice]
Filters on the second table in a LEFT JOIN need to be in the ON clause:
FROM [dbo].[DimDate] d LEFT JOIN
[dbo].[mytable] s
ON month(d.date) = month(s.date) AND
year(s.date) = year(d.date) AND
s.Pharmacist = 'abc'
WHERE year(d.date) = '2020'
Putting the s.Pharmacist = 'abc' in the WHERE clause filters out NULL values -- which undoes the LEFT JOIN.
Note that conditions on the first table should still go in the WHERE clause so the rows really are filtered out.
I would also recommend writing the WHERE clause as:
WHERE d.date >= '2020-01-01' AND
d.date < '2021-01-01'
This allows the query to use and index on (date) for filtering.

Group by with inner joins returning too many records

A table has a foundStatus column that's a char. I want to return a list of foundStatuses with a count next to each - this works:
SELECT foundstatus, count(foundstatus) as total
FROM findings f
WHERE findDateTime BETWEEN '2008-01-01' AND '2017-06-24 23:59:59'
group by foundstatus
order by foundstatus
I need to join several tables to build a where clause - and doing so begins to return too many columns. I can get this to work:
SELECT foundstatus, count(foundstatus) as total
FROM findings f left join
pets p
on f.petid = p.petid
WHERE findDateTime BETWEEN '2008-01-01' AND '2017-06-24 23:59:59'
group by foundstatus
order by foundstatus
By doing a left join, however - any subsequent joins I do (left or inner) just returns too many rows (I guess because multiple records from joined tables are being returned):
SELECT foundstatus, count(foundstatus) as total
FROM findings f left join
pets p
on f.petid = p.petid inner join
petTags pt
ON p.petID = pt.petID
WHERE findDateTime BETWEEN '2008-01-01' AND '2017-06-24 23:59:59'
group by foundstatus
order by foundstatus
I need a statement like the bottom only with 5 joined tables to return the same counts as the top 2 queries. I'm sure this is fairly easy but can find nothing on Google - how can I do it? Thx.
Assuming you have a primary key in findings you can do:
select f.foundstatus, count(distinct f.findingsId) as total
from findings f left join
pets p
on f.petid = p.petid left join
petTags pt
on p.petID = pt.petID
where f.findDateTime >= '2008-01-01' and
f.findDateTime < '2017-06-25'
group by f.foundstatus
order by foundstatus;
Often, count(distinct) is not the best way to go. My guess is that EXISTS conditions in the WHERE clause are better way to do what you want.

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

How to include non-matching rows?

This script is working as intended.
select a.Loc, Count(a.PID) as TotalVisit
from AccountCount as a
inner join Data as b
on a.PID = b.PID
where
cast(a.DateTime as date) between cast(b.ADateTime as date) and cast(b.DDateTime as date)
and year(a.DateTime)=2015
and month(a.DateTime)=05
group by a.Loc
order by a.Loc;
However, I need to include few more PID from Data table. These PID is not in AccountCount table.
select LocID, PID
from Data
where
and cast(ADateTime as date) = cast(DDateTime as date)
and year(ADateTime) = 2015
and month(ADateTime)=05
order by LocID;
In simple terms, I need to do union between the first script and the second script. I tried to right join the Data table but it didn't work.
Using the UNION ALL provided by xQbert, I get the result like.
Loc TotalVisit
1st floor 20
2nd floor 5
3rd floor 8
1st floor 2
It needs to be
Loc TotalVisit
1st floor 22
2nd floor 5
3rd floor 8
Please help.
Thank you.
I would think a right join would work so long as the ON criteria is setup correctly and the Where clause is moved to the join (as it makes the right join an inner join again if left in the where clause. (the outer join results in null records which are excluded by the where clause thus negating the outer join))
The union all doesn't allow for the aggregation of data. To me the outer join is the right thing to do here. We just need to understand the data better to make it work correctly. However, using union all you could simply sum up the results... using an outer query... but now that you've given some sample data I might be able to figure out why the outer join wasn't working)
Using union all ... (I'm about getting it working then improving it)
Select X.Loc, sum(X.TotalVisit) as TotalVisit
from (SELECT a.Loc as LOC, Count(a.PID) as TotalVisit
from AccountCount as a
inner join Data as b
on a.PID = b.PID
where
cast(a.DateTime as date) between cast(b.ADateTime as date) and cast(b.DDateTime as date)
group by a.Loc
UNION ALL
select LocID as LOC, count(PID)
from Data
where
and cast(ADateTime as date) = cast(DDateTime as date)
GROUP BY by LocID
) X
GROUP BY X.Loc
ORDER BY X.LOC
This leads me to this... which I think would work Take the first non-null value of location from AccountCount.Loc and Data.LocID and use it. Notice no where clause...
SELECT Coalesce(A.Loc, B.LocID) as Loc, count(B.PID) as TotalVisit
FROM Data B
LEFT JOIN AccountCount A
on B.PID = A.PID
and (cast(a.DateTime as date) between cast(b.ADateTime as date) and cast(b.DDateTime as date)
OR cast(B.ADateTime as date) = cast(B.DDateTime as date))
GROUP BY Coalesce(A.Loc, B.LocID)
Order by Coalesce(A.Loc, B.LocID)