MSSQL: GROUP BY and Count - sql

I have 3 tables that contain info about users. I would like to find how many of each type of item in each bucket a person has.
I'm not grasping why this wouldn't work. Does table join order matter or something I not aware of?
A sample of the tables:
PERSONS
ID
NAME
1
John
2
Jane
BUCKETS
ID
LABEL
PERSONID
1
Random
1
2
Vacation
1
THINGS
ID
BUCKETID
TYPE
VALUE
1
1
Image
abc12
2
1
Image
abc13
3
1
Video
abc34
4
1
Image
def12
5
1
Video
def34
SELECT P.NAME, B.LABEL, T.TYPE, COUNT(T.TYPE)
FROM PERSONS P
LEFT JOIN BUCKETS B ON
B.PERSONID = P.ID
LEFT JOIN THING T ON
T.BUCKETID = B.ID
GROUP BY P.NAME, B.LABEL, T.TYPE
I expect it to return:
John, Random, Images, 3
John, Random, Videos, 2
But it returns:
John, Random, Images, 5
John, Random, Videos, 5
I have tried COUNT(*) which results in the same and COUNT(DISTINCT T.TYPE) which of course returns 1 as the count.
This works perfectly in MySQL. Fiddle here: https://www.db-fiddle.com/f/vcb3wiPMSAFBXrWbgYxuMH/8
MSSQL is a different beast all together.

I think you want count(distinct) of some sort. I would speculate:
SELECT P.NAME, B.LABEL, T.TYPE, COUNT(DISTINCT T.BUCKETID)
FROM PERSONS P LEFT JOIN
BUCKETS B
ON B.PERSONID = P.ID LEFT JOIN
THING T
ON T.BUCKETID = B.ID
GROUP BY P.NAME, B.LABEL, T.TYPE;

Related

SQL get table1 names with a count of table2 and table3

I have three tables, table1 is connected to table2 and table3, but table2 and table3 are not connected. I need an output count of table2 and table3 for each table1 row. I have to use joins and a group by table1.name
SELECT Tb_Product.Name, count(TB_Offers.Prod_ID) 'Number of Offers', count(Tb_Requests.Prod_ID) 'Number of Requests'
FROM Tb_Product LEFT OUTER JOIN
Tb_Requests ON Tb_Product.Prod_ID = Tb_Requests.Prod_ID LEFT OUTER JOIN
TB_Offers ON Tb_Product.Prod_ID = TB_Offers.Prod_ID
GROUP BY Tb_Product.Name
I need to combine these queries:
SELECT Tb_Product.[Name], count(TB_Offers.Prod_ID) 'Number of Offers'
FROM Tb_Product LEFT OUTER JOIN
TB_Offers ON Tb_Product.Prod_ID = TB_Offers.Prod_ID
GROUP BY Tb_Product.[Name]
SELECT Tb_Product.[Name], count(Tb_Requests.Prod_ID) 'Number of Requests'
FROM Tb_Product LEFT OUTER JOIN
Tb_Requests ON Tb_Product.Prod_ID = Tb_Requests.Prod_ID
GROUP BY Tb_Product.[Name]
Results:
Name Number of Offers
Airplane 6
Auto 5
Bike 3
Camera 0
Computer 12
Milk 4
Oil 4
Orange 6
Telephone 0
Truck 6
TV 4
Name Number of Requests
Airplane 1
Auto 5
Bike 0
Camera 2
Computer 6
Milk 4
Oil 5
Orange 6
Telephone 0
Truck 1
TV 5
My results for offers and requests are the same value. I am not sure what I am doing wrong with the joins. Do I need to somehow join product to request and separately join product to offers? This needs to be done in one query.
This is for a class. Explanation would also be appreciated.
The simplest way to do this is to count the distinct values of each column:
SELECT
Tb_Product.Name,
count(distinct TB_Offers.Prod_ID) 'Number of Offers',
count(distinct Tb_Requests.Prod_ID) 'Number of Requests'
FROM
Tb_Product
LEFT OUTER JOIN
Tb_Requests ON Tb_Product.Prod_ID = Tb_Requests.Prod_ID
LEFT OUTER JOIN
TB_Offers ON Tb_Product.Prod_ID = TB_Offers.Prod_ID
GROUP BY
Tb_Product.Name
This is necessary because of the way joins work consecutively to produce a rowset that is a combination of all the input relations. COUNT() normally performs a count of non-null values in a column.
You can also do something like this, which aggregates the counts from the child tables independently and then joins them to the base table:
SELECT
p.Name,
o.cnt as Offer_Count,
r.cnt as Request_Count
FROM
TB_Product p
LEFT OUTER JOIN
(SELECT Prod_ID, COUNT(1) cnt FROM TB_Offers GROUP BY Prod_ID) o
LEFT OUTER JOIN
(SELECT Prod_ID, COUNT(1) cnt FROM TB_Requests GROUP BY Prod_ID) r
More explanation...
Let's say you have two products:
Prod_ID
Name
1
Widget
2
Gizmo
And two offers, one for each product:
Offer_ID
Prod_ID
100
1
200
2
And two requests for each product:
Request_ID
Prod_ID
1001
1
1002
1
2001
2
2002
2
Now you join Product relation to Offer relation on Prod_ID, you get a result like this:
Prod_ID
Name
Offer_ID
Prod_ID
1
Widget
100
1
2
Gizmo
200
2
Now when you join that relation to Requests on Prod_ID, you get something like this:
Prod_ID
Name
Offer_ID
Prod_ID
Request_ID
Prod_ID
1
Widget
100
1
1001
1
1
Widget
100
1
1002
1
2
Gizmo
200
2
2001
2
2
Gizmo
200
2
2002
2
Now when you count any of these columns you get 4 because each column has 4 values.

Join 2 tables where one table may or may not have an entry

i have 2 tables (person + activities)
Person
id
name
1
John
2
Axel
3
William
Activities
activity_id
person_id
activity_type
1
1
Login
2
1
Visited Website
3
1
Logout
4
3
Login
5
3
Logout
As you can see John and William have both several activities. But Axel has no activities at all.
The result i try to achieve is as follows.
I want to select id and name from every entry of the person table and the activity_id and activity_type from the activity table.
If the person has no activities yet, the id and name of the person should still be shown.
And if the person has more then one activity, only the one with the highest id should be shown.
The result i aim for:
id
name
activity_id
activity_type
1
John
3
Logout
2
Axel
null
null
3
William
5
Logout
When i try a left join:
select p.id, p.name, a.activity_id, a.activity_type
from person p left join activity a on p.id = a.person_id
order by p.id
i get this result:
id
name
activity_id
activity_type
1
John
1
Login
1
John
2
Visited Website
1
John
3
Logout
2
Axel
null
null
3
William
4
Login
3
William
5
Logout
But as i only one want one entry per person i added the following where-clause:
select p.id, p.name, a.activity_id, a.activity_type
from person p left join activity a on p.id = a.person_id
where a.id = (select max(id) from activity a2 where a2.person_id = p.id)
order by p.id
This is the result:
id
name
activity_id
activity_type
1
John
3
Logout
3
William
5
Logout
The entry for John and William are as i wanted. Only the 'last' activity is shown.
The problem is Axel is not shown anymore.
Any help appriciated.
Many thanks in advance!
In Postgres the simplest and fastest method is usually to use a Postgres extension to SQL, distinct on:
select distinct on (p.id) p.id, p.name, a.activity_id, a.activity_type
from person p left join
activity a
on p.id = a.person_id
order by p.id, a.activity_id desc;
distinct on returns one row for keys specified in the parentheses. The specific row is determining by the order by.
For performance, I would suggest writing the query like this:
select p.id, p.name, a.activity_id, a.activity_type
from person p left join
(select distinct on (a.person_id) a.*
from activity a
order by a.person_id, a.activity_id desc
) a
on p.id = a.person_id;
And then creating an index on activity(person_id, activity_id desc).
Just another option you could use, which might be more performant with the correct indexes in place, to use a window function to identify the activity per person
select p.id, p.name, a.activity_id, a.activity_type
from person p
left join (
select *, Row_Number() over(partition by person_id order by activity_id desc) rn
from activities
)a on a.person_id=p.id and a.rn=1

How to get a subset of a table using the count operator

SQL Server 2012.
Each enterprise has one or more teams. Each team can have sponsors or cannot have any sponsors.
Enterprise
Id Name
1 A
2 B
3 C
and the team table:
Team
Id Name EnterpiseId
1 For 1
2 Xor 2
3 Nor 2
4 Xur 1
5 Fir 3
6 Fte 2
and now the table sponsor
Sponsor
id Name TeamId
1 XX1 1
2 FC7 1
3 89U 3
Now I need to know how to present this table that shows only the enterprises that have at least one sponsor.
FINAL TABLE
Id Name
1 A
3 C
The enterprise B has 3 teams, but there are no sponsors for those 3 teams, so I want to show the enterprises that have sponsors which are "A" and "C".
Select A.id, A.name
FROM Enterprise A
LEFT JOIN Team B on A.Id=b.EnterpriseId
INNER JOIN Sponsor C on B.Id=C.TeamId
Where (SELECT COUNT(*) FROM Sponsor S INNER JOIN Team T on T.id=S.TeamId group by T.id)>0
This is not working. I am not used to use subsets which is likely the way to achieve the desired table. Thanks.
You can do this with JOINs. The GROUP BY is just to eliminate duplicates:
SELECT e.id, e.name
FROM Enterprise e JOIN
Team t
ON e.Id = t.EnterpriseId JOIN
Sponsor s
ON t.Id = s.TeamId
GROUP BY e.id, e.name;
The JOIN only matches teams that have sponsors.
If you were looking for more than one, then something like HAVING COUNT(*) > 1 would be called for.

PostgreSQL - Selecting count of unique values in one and two columns

Firstly, I'd like to apologise for the ambiguous title (I promise to revise it once I'm actually aware of the problem I'm trying to solve!)
I have two tables, player and match, which look like the following:
player:
id name
-- ----
1 John
2 James
3 April
4 Jane
5 Katherine
match:
id winner loser
-- ------ -----
1 1 2
2 3 4
Records in the match table represent a match between two players, where the id column is generated by the database, and the values in the winner and loser columns reference the id column in the player table.
I want to run a query which spits out the following:
player.id player.name total_wins total_matches
--------- ----------- ---------- -------------
1 John 1 1
2 James 0 1
3 April 1 1
4 Jane 0 1
5 Katherine 0 0
I currently have a query which retrieves total_wins, but I'm not sure how to get the total_matches count on top of that.
select p.id, p.name, count(m.winner)
from player p left join match m on p.id = m.winner
group by p.id, p.name;
Thanks for your help!
Try
select p.id, p.name,
sum(case when m.winner = p.id then 1 end ) as total_wins,
count(m.id) as total_matches
from player p
left join match m on p.id in ( m.winner, m.loser )
group by p.id, p.name;
One method splits the match match table, so you have a single row for each win and loss. The rest is just a left join and aggregation:
select p.id, p.name, coalesce(sum(win), 0) as win, count(m.id) as total_matches
from player p left join
(select match, winner as id, 1 as win, 0 as loss from match
union all
select match, loser as id, 0 as win, 1 as loss from match
) m
on p.id = m.id
group by p.id, p.name;

SQL Query not working as it should

So I have three tables:
authors:
--------
ID Name
1 John
2 Sue
3 Mike
authors_publications:
---------------------
AuthorID PaperID
1 1
1 2
2 2
3 1
3 2
3 3
publications:
-------------
ID year
1 2004
2 2005
3 2004
I'm trying to join them so that I count the number of publications each author has had on 2004. If they didn't publish anything then it should be zero
ideally the result should look like this:
ID Name Publications_2004
1 John 1
2 Sue 0
3 Mike 2
I tried the following:
select a.ID, Name, count(*) as Publications_2004
from authors_publications as ap left join authors as a on ap.AuthorID=a.ID left join publications as p on p.ID=ap.PaperID
where year=2004
group by ap.AuthorID
I don't understand why it's not working. Its completely removing any authors that haven't published in 2004.
Your WHERE statement is taking the result set returned from the JOIN's and them trimming off records where year<>2004.
To get around this you can do a few different things
You can apply a filter to the publications table in the ON statement when joining. This will filter the results before joining
SELECT a.ID,
NAME,
count(*) AS Publications_2004
FROM authors_publications AS ap
LEFT JOIN authors AS a
ON ap.AuthorID = a.ID
LEFT JOIN publications AS p
ON p.ID = ap.PaperID AND
p.year = 2004
GROUP BY ap.AuthorID
You could use a case statement instead of a WHERE:
SELECT a.ID,
NAME,
SUM(CASE WHEN p.year = 2004 THEN 1 ELSE 0) END AS Publications_2004
FROM authors_publications AS ap
LEFT JOIN authors AS a
ON ap.AuthorID = a.ID
LEFT JOIN publications AS p
ON p.ID = ap.PaperID
GROUP BY ap.AuthorID, NAME
You could use a subquery to pre-filter the publications table to only 2004 records, which is just explicitly doing what was implicit in the first option:
SELECT a.ID,
NAME,
count(*) AS Publications_2004
FROM authors_publications AS ap
LEFT JOIN authors AS a
ON ap.AuthorID = a.ID
LEFT JOIN (SELECT * FROM publications WHERE AND year = 2004) AS p
ON p.ID = ap.PaperID
GROUP BY ap.AuthorID, NAME
Also, because you are not aggregating NAME with a formula, you should add that to your GROUP BY otherwise you may get funky results.