Multiple counts on different tables in same query - sql

We have a tool that allows users to create their own groups. Within these groups, users can write posts. What I am trying to determine is the relationship between size of the group and total number of posts in that group.
I can do SQL statements to get a list of group names and the number of users in that group (Query 1) and a list of group names and the number of posts (Query 2) but I would like for both to be in the same query.
Query 1
select count(pg.personID) as GroupSize, g.GroupName
from Group g inner join PersonGroup pg g.GroupID = pg.GroupID
where LastViewed between #startDate and #enddate and
g.Type = 0
group by g.GroupID, g.GroupName
order by GroupSize
Query 2
select count(gp.PostID) as TotalPosts, g.GroupName
from Group g inner join GroupPost gp on g.GroupID = gp.GroupID
inner join Post p on gp.PostID = p.PostID
where g.Type = 0 and
gp.Created between #startDate and #enddate
group by g.GroupID, g.GroupName
order by TotalPosts
**Note: A person can post the same "post" to multiple groups
I believe from this data I could build a Histogram (# of groups with 10-20 users, 21-30 users, etc..) and incorporate average number of posts for groups in those different bins.

A simple solution would be to use those queries as Sub queries, and combine them:
SELECT
grps.GroupName,
grps.GroupSize,
psts.TotalPosts
FROM (
select count(pg.personID) as GroupSize, g.GroupName, g.GroupID
from Group g inner join PersonGroup pg g.GroupID = pg.GroupID
where LastViewed between #startDate and #enddate and
g.Type = 0
group by g.GroupID, g.GroupName
order by GroupSize) grps
JOIN (
select count(gp.PostID) as TotalPosts, g.GroupName, g.groupID
from Group g inner join GroupPost gp on g.GroupID = gp.GroupID
inner join Post p on gp.PostID = p.PostID
where g.Type = 0 and
gp.Created between #startDate and #enddate
group by g.GroupID, g.GroupName
order by TotalPosts) psts
ON psts.GroupID = grps.GroupID

Paul's solution assumes that the two sets of groups (by posts and by users) is the same. This may not be true, so either a full outer join or union all is needed.
My preference is the following:
with groups as
(
select *
from Group g
where g.Type = 0
and g.LastViewed between #startDate and #enddate
)
select GroupId, GroupName, SUM(GroupSize) as GroupSize, SUM(TotalPosts) as TotalPosts)
from
(
(select groups.GroupId, groups.GroupName, 1 as GroupSize, 0 as TotalPosts
from groups
join PersonGroup pg
on pg.GroupId = groups.groupId
)
union all
(select groups.GroupId, groups.GroupName, 0 as GroupSize, 1 as TotalPosts
from groups
join GroupPost gp
on groups.GroupId = gp.GroupId
join Post p
on gp.PostId = p.PostId
)
)
group by GroupId, GroupName
The "with" clause defines the set of groups that you are using. This places the definition in one place, making it obvious that the two subqueries have the same filtering. The two subqueries simply have flags indicating each of the two variables, which are then aggregated at the higher level. Sometimes it is more efficient to also do the aggregation inside the subqueries, particularly when there are indexes.

Related

converting subquery to join in postgresql

select p.postid postid,p.posttypeid posttypeid,
(select count(parentid) as no_of_answers from post
where parentid=p.postid), p.title title,
p.body body ,
p.creationdate creationdate,
p.modifieddate modifieddate,
p.modifieduserid modifieduserid,
p.score score,p.views no_of_views,
u.userid userid
from post
as p
left outer join user_quest as u on p.owneruserid = u.userid
where p.posttypeid = 1
Order by p.creationdate desc
LIMIT 10 OFFSET 0;
I need to convert his subquery to join, please help
You could use CTE:
WITH counts AS (
SELECT
count(parentid) AS no_of_answers,
parentid
FROM post
GROUP BY parentid
)
SELECT
p.postid,
p.posttypeid,
p.title,
COALESCE(c.no_of_answers, 0) AS no_of_answers,
p.body,
p.creationdate,
p.modifieddate,
p.modifieduserid,
p.score,
p.views AS no_of_views,
u.userid
FROM post AS p
LEFT JOIN counts c ON (c.parentid = p.postid)
LEFT JOIN user_quest AS u ON (p.owneruserid = u.userid)
WHERE p.posttypeid = 1
ORDER BY p.creationdate DESC
LIMIT 10 OFFSET 0;
or put yours subquery to JOIN:
SELECT
p.postid,
p.posttypeid,
p.title,
COALESCE(c.no_of_answers, 0) AS no_of_answers,
p.body,
p.creationdate,
p.modifieddate,
p.modifieduserid,
p.score,
p.views AS no_of_views,
u.userid
FROM post AS p
LEFT JOIN
(
SELECT
count(parentid) AS no_of_answers,
parentid
FROM post
GROUP BY parentid
) c ON (c.parentid = p.postid)
LEFT JOIN user_quest AS u ON (p.owneruserid = u.userid)
WHERE p.posttypeid = 1
ORDER BY p.creationdate DESC
LIMIT 10 OFFSET 0;
But I would prefer CTE. The performance of CTEs and subqueries should, in theory, be the same since both provide the same information to the query optimizer. One difference is that a CTE used more than once could be easily identified and calculated once. And it's look pretties because is easier to read, at least i thing so.

How to create distinct count from queries with several tables

I am trying to create one single query that will give me a distinct count for both the ActivityID and the CommentID. My query in MS Access looks like this:
SELECT
tbl_Category.Category, Count(tbl_Activity.ActivityID) AS CountOfActivityID,
Count(tbl_Comments.CommentID) AS CountOfCommentID
FROM tbl_Category LEFT JOIN
(tbl_Activity LEFT JOIN tbl_Comments ON
tbl_Activity.ActivityID = tbl_Comments.ActivityID) ON
tbl_Category.CategoryID = tbl_Activity.CategoryID
WHERE
(((tbl_Activity.UnitID)=5) AND ((tbl_Comments.PeriodID)=1))
GROUP BY
tbl_Category.Category;
I know the answer must somehow include SELECT DISTINCT but am not able to get it to work. Do I need to create multiple subqueries?
This is really painful in MS Access. I think the following does what you want to do:
SELECT ac.Category, ac.num_activities, aco.num_comments
FROM (SELECT ca.category, COUNT(*) as num_activities
FROM (SELECT DISTINCT c.Category, a.ActivityID
FROM (tbl_Category as c INNER JOIN
tbl_Activity as a
ON c.CategoryID = a.CategoryID
) INNER JOIN
tbl_Comments as co
ON a.ActivityID = co.ActivityID
WHERE a.UnitID = 5 AND co.PeriodID = 1
) as caa
GROUP BY ca.category
) as ca LEFT JOIN
(SELECT c.Category, COUNT(*) as num_comments
FROM (SELECT DISTINCT c.Category, co.CommentId
FROM (tbl_Category as c INNER JOIN
tbl_Activity as a
ON c.CategoryID = a.CategoryID
) INNER JOIN
tbl_Comments as co
ON a.ActivityID = co.ActivityID
WHERE a.UnitID = 5 AND co.PeriodID = 1
) as aco
GROUP BY c.Category
) as aco
ON aco.CommentId = ac.CommentId
Note that your LEFT JOINs are superfluous because the WHERE clause turns them into INNER JOINs. This adjusts the logic for that purpose. The filtering is also very tricky, because it uses both tables, requiring that both subqueries have both JOINs.
You can use DISTINCT:
SELECT
tbl_Category.Category, Count(DISTINCT tbl_Activity.ActivityID) AS CountOfActivityID,
Count(DISTINCT tbl_Comments.CommentID) AS CountOfCommentID
FROM tbl_Category LEFT JOIN
(tbl_Activity LEFT JOIN tbl_Comments ON
tbl_Activity.ActivityID = tbl_Comments.ActivityID) ON
tbl_Category.CategoryID = tbl_Activity.CategoryID
WHERE
(((tbl_Activity.UnitID)=5) AND ((tbl_Comments.PeriodID)=1))
GROUP BY
tbl_Category.Category;

Group by and Having using the same Column

I have the following code for a SSRS report using stored procedures, I want to filter the group by to display only the information concerning group level 2, but I'm not sure how to do so.
Here is the stored procedure I am using:
ALTER PROCEDURE CheckActiveMembers
AS
BEGIN
SELECT COUNT(MemberID), GroupLevel
FROM Member M
INNER JOIN [Policy] P ON P.PolicyID = M.PolicyID
INNER JOIN [Group] G ON P.GroupID = G.GroupID
WHERE GroupLevel = '2' AND M.CancellationDate IS NULL
GROUP BY GroupLevel
HAVING GroupLevel = '2'
END
Group by is used to group records and applying having works for within that specific group only on which it is applied whereas a where clause is valid for whole table itself like in your case
SELECT COUNT(MemberID),
GroupLevel
FROM Member M
INNER JOIN [Policy] P
ON P.PolicyID = M.PolicyID
INNER JOIN [Group] G
ON P.GroupID = G.GroupID
WHERE GroupLevel = '2' AND
M.CancellationDate IS NULL
GROUP BY GroupLevel
HAVING GroupLevel = '2'
The having is irrelevant here as you have already filtered records in your where clause having usually is used to check aggregation within that group
HAVING should only contain aggregated columns, non-aggregated (grouped) columns go with WHERE:
SELECT COUNT(MemberID), GroupLevel
FROM Member M
INNER JOIN [Policy] P
ON P.PolicyID = M.PolicyID
INNER JOIN [Group] G
ON P.GroupID = G.GroupID
WHERE GroupLevel = '2' AND M.CancellationDate IS NULL
GROUP BY GroupLevel
If you wanted ones where there was > 1 member, you could use the aggregated COUNT in the HAVING:
SELECT COUNT(MemberID), GroupLevel
FROM Member M
INNER JOIN [Policy] P
ON P.PolicyID = M.PolicyID
INNER JOIN [Group] G
ON P.GroupID = G.GroupID
WHERE GroupLevel = '2' AND M.CancellationDate IS NULL
GROUP BY GroupLevel
HAVING COUNT(MemberId) > 1

Remove grouped data set when total of count is zero with subquery

I'm generating a data set that looks like this
category user total
1 jonesa 0
2 jonesa 0
3 jonesa 0
1 smithb 0
2 smithb 0
3 smithb 5
1 brownc 2
2 brownc 3
3 brownc 4
Where a particular user has 0 records in all categories is it possible to remove their rows form the set? If a user has some activity like smithb does, I'd like to keep all of their records. Even the zeroes rows. Not sure how to go about that, I thought a CASE statement may be of some help but I'm not sure, this is pretty complicated for me. Here is my query
SELECT DISTINCT c.category,
u.user_name,
CASE WHEN (
SELECT COUNT(e.entry_id)
FROM category c1
INNER JOIN entry e1
ON c1.category_id = e1.category_id
WHERE c1.category_id = c.category_id
AND e.user_name = u.user_name
AND e1.entered_date >= TO_DATE ('20140625','YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')) > 0 -- I know this won't work
THEN 'Yes'
ELSE NULL
END AS TOTAL
FROM user u
INNER JOIN role r
ON u.id = r.user_id
AND r.id IN (1,2),
category c
LEFT JOIN entry e
ON c.category_id = e.category_id
WHERE c.category_id NOT IN (19,20)
I realise the case statement won't work, but it was an attempt on how this might be possible. I'm really not sure if it's possible or the best direction. Appreciate any guidance.
Try this:
delete from t1
where user in (
select user
from t1
group by user
having count(distinct category) = sum(case when total=0 then 1 else 0 end) )
The sub query can get all the users fit your removal requirement.
count(distinct category) get how many category a user have.
sum(case when total=0 then 1 else 0 end) get how many rows with activities a user have.
There are a number of ways to do this, but the less verbose the SQL is, the harder it may be for you to follow along with the logic. For that reason, I think that using multiple Common Table Expressions will avoid the need to use redundant joins, while being the most readable.
-- assuming user_name and category_name are unique on [user] and [category] respectively.
WITH valid_categories (category_id, category_name) AS
(
-- get set of valid categories
SELECT c.category_id, c.category AS category_name
FROM category c
WHERE c.category_id NOT IN (19,20)
),
valid_users ([user_name]) AS
(
-- get set of users who belong to valid roles
SELECT u.[user_name]
FROM [user] u
WHERE EXISTS (
SELECT *
FROM [role] r
WHERE u.id = r.[user_id] AND r.id IN (1,2)
)
),
valid_entries (entry_id, [user_name], category_id, entry_count) AS
(
-- provides a flag of 1 for easier aggregation
SELECT e.[entry_id], e.[user_name], e.category_id, CAST( 1 AS INT) AS entry_count
FROM [entry] e
WHERE e.entered_date BETWEEN TO_DATE('20140625','YYYYMMDD') AND TO_DATE('20140731', 'YYYYMMDD')
-- determines if entry is within date range
),
user_categories ([user_name], category_id, category_name) AS
( SELECT u.[user_name], c.category_id, c.category_name
FROM valid_users u
-- get the cartesian product of users and categories
CROSS JOIN valid_categories c
-- get only users with a valid entry
WHERE EXISTS (
SELECT *
FROM valid_entries e
WHERE e.[user_name] = u.[user_name]
)
)
/*
You can use these for testing.
SELECT COUNT(*) AS valid_categories_count
FROM valid_categories
SELECT COUNT(*) AS valid_users_count
FROM valid_users
SELECT COUNT(*) AS valid_entries_count
FROM valid_entries
SELECT COUNT(*) AS users_with_entries_count
FROM valid_users u
WHERE EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT COUNT(*) AS users_without_entries_count
FROM valid_users u
WHERE NOT EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT uc.[user_name], uc.[category_name], e.[entry_count]
FROM user_categories uc
INNER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
*/
-- Finally, the results:
SELECT uc.[user_name], uc.[category_name], SUM(NVL(e.[entry_count],0)) AS [entry_count]
FROM user_categories uc
LEFT OUTER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
Here's another method:
WITH totals AS (
SELECT
c.category,
u.user_name,
COUNT(e.entry_id) AS total,
SUM(COUNT(e.entry_id)) OVER (PARTITION BY u.user_name) AS user_total
FROM
user u
INNER JOIN
role r ON u.id = r.user_id
CROSS JOIN
category c
LEFT JOIN
entry e ON c.category_id = e.category_id
AND u.user_name = e.user_name
AND e1.entered_date >= TO_DATE ('20140625', 'YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')
WHERE
r.id IN (1, 2)
AND c.category_id IN (19, 20)
GROUP BY
c.category,
u.user_name
)
SELECT
category,
user_name,
total
FROM
totals
WHERE
user_total > 0
;
The totals derived table calculates the totals per user and category as well as totals across all categories per user (using SUM() OVER ...). The main query returns only rows where the user total is greater than zero.

SQL Select on priority of not exist

I'm quite confused as to how I'd structure a query I need:
select distinct NodeID from GroupsTable
where GroupID in
(
select GroupID from UserGroupsTable
where UserID = #UserID
)
I need to select NodeID from GroupsTable where the User belongs to multiple groups, UNLESS any group that the user belongs to does not have that NodeID.
The above code only selects a NodeID from GroupsTable where it apears in any one of the users groups.
So:
Group|Node
1|A
1|B
2|A
I need only B to be selected for example.
Any Ideas?
declare #UserID int
select G.NodeID
from GroupsTable G
inner join
(
select GroupID, count(*) over () GroupCount
from UserGroupsTable
where UserID = #UserID
) UG on UG.GroupID = G.GroupID
GROUP BY G.NodeID, UG.GroupCount
HAVING COUNT(*) != UG.GroupCount OR UG.GroupCount = 1
-- count=1 is special. it will always equal groupcount, but let it through
(assuming this is SQL Server and version >= 2005 for using the count(*) over)
#answer HAVING condition changed following comments
You can use a having clause to ensure that the count of groups per node is less than the count of all groups the user is in:
select g.NodeID
from GroupsTable g
join UserGroupsTable ug
on g.GroupID = ug.GroupID
where ug.UserID = 1
group by
ug.UserID
, g.NodeID
having COUNT(distinct g.GroupID) <
(
select COUNT(distinct GroupID)
from UserGroupsTable ug2
where ug2.UserID = ug.UserID
)