SQL Combining Counts with Joins - sql

I have three tables:
Messages
messageid | userid | text
Ex: 1 | 1303 | hey guys
Users
userid | username
Ex:
1303 | trantor
1301 | tranro1
1302 | trantor2
Favorites
messageid | userid
Ex:
1 | 1302
1 | 1301
What I want to do, is display a table that has usernames, and counts the number of times they're messages were favorited a certain number of times. In the example above, I want to query saying "how many messages does each user have that has been liked exactly twice?"
and it would show a table that has a row saying
trantor | 1
A natural extension is to replace exactly twice with "at least 2", "more than 6", etc. Im trying to combine count with joins and find myself confused. And since the tables are large, Im getting counts but not confident that my query is working correctly. I have read this article but am still confused :L
What I have so far:
SELECT USERS.username, COUNT(FAVORITES.id) FROM USERS INNER JOIN FAVORITES ON FAVORITES.userID=USERS.id WHERE COUNT(FAVORITES.id) > 2;
But I dont think it works.
On S.O. I've found these questions on "correlated subqueries" but am thoroughly confused.
Would it be something like this?
SELECT USERS.username,
, ( SELECT COUNT(FAVORTIES.userid)
FROM FAVORITES INNER JOIN ON MESSAGES
WHERE FAVORITES.messageid = MESSAGES.messageid
)
FROM USERS

There's a couple things you should know with aggregate functions in SQL. First off, you need to do a GROUP BY if you're selecting an aggregate function. Second, any conditions involving aggregate functions are to be used with a HAVING clause rather than a WHERE.
The GROUP BY is to be applied to the column(s) you're selecting alongside any aggregate functions.
Here's a basic structure:
SELECT attribute1, COUNT(attribute2)
FROM someTable
GROUP BY attribute1
HAVING COUNT(attribute2) > 2;
Apply anything else you're using such as JOINS and ORDER BY and what not.
note: There's a certain order these clauses have to be in. Such as ORDER BY goes after HAVING, which comes after GROUP BY and so forth.
If I'm remembering correctly, the order of operations go:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

When you use aggregate function such as COUNT() you will need to use GROUP BY together with HAVING rather than WHERE
SELECT USERS.username, COUNT(FAVORITES.id)
FROM USERS
INNER JOIN FAVORITES
ON FAVORITES.userID=USERS.id
GROUP BY USERS.username
HAVING COUNT(FAVORITES.id) > 2;
From documentation
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.

Related

SQL add columns of each record together

To be blunt I don't know SQL however I don't want the answer, I want to work it out myself.
Here's the question:
Write a SQL query to calculate the number of goals for each team.
players
id name team_id goals
1 Joel 1 3
2 Ed 2 1
3 Simon 2 4
teams
id name
1 New Zealand
2 London
What I'm asking for is an arrow to information that will allow me to solve the question.
I've tried looking myself but I don't even know the correct terminology to ask the question, googling 'write sql to add fields for each row' just seems to return about adding columns or inserting.
You need to first try to JOIN your tables(id in Teams will be linked to TeamId in Players.) based on the foreign key columns.
Then you need to do the GROUP BY and use the aggregate function SUM to get the goals for each team.
So your query will be like:
select t.name, sum(p.goals) as cnt,
from players p inner join teams t on p.teamid = t.id
group by t.name
First you have to group players by teams : use t1.id=t2.id to join values in the tables, and then group theme by "BROUP BY" t.name.
Then : user "SUM(value)" function who sum values .
select teams.name,sum(players.goals) from players,team where player.team_id=teams.id group by teams.name;

SQL Server : join on array of ID's from previous join

I have 2 tables. One has been pruned to show only ID's which meet certain criteria. The second needs to be pruned to show only data that matches the previous "array" of id's. there can be multiple results.
Consider the following:
Query_1_final: Returns the ID's of users whom meet certain criteria:
select
t1.[user_id]
from
[SQLDB].[db].[meeting_parties] as t1
inner join
(select distinct
[user_id]
from
[SQLDB].[db].[meeting_parties]
group by
[user_id]
having
count([user_id]) = 1) as t2 on t1.user_id = t2.user_id
where
[type] = 'organiser'
This works great and returns:
user_id
--------------------
22
1255
9821
and so on...
It produces a single column with the ID's of everyone who is a "Meeting Organizer" and also in the active_meetings table. (note, there are multiple types/roles, this was the best way to grab them all)
Now, I need this data to filter another table, another join. Here is the start of my query
Query_2_PREP: returns 5 columns where the meeting has "started" already.
SELECT
[meeting_id]
,[meeting_style]
,[meeting_day]
,[address]
,[promos]
FROM
[SQLDB].[db].[all_meetings]
WHERE
[meeting_started] = 'TRUE'
This works as well
meeting_id | meeting_style | meeting_day ...
---------------------------------------------
23 open M,F,SA
23 discussion TU,TH
23 lead W,F
and so on...
and returns ALL 10,982 meetings that started, but I need it to return only the meetings that are from the distinct 'organiser's ID's from Query_1_final (which should be more like 1200 records or so)
Ideally, I need something "like" this below (but of course it does not work)
Query 2: needs to return all meetings that are from organiser ID's only.
SELECT
[meeting_party_id]
,[meeting_style]
,[meeting_day]
,[address]
,[promos]
FROM
[SQLDB].[db].[all_meetings]
WHERE
[meeting_started] = 'TRUE'
AND [meeting_party_id] = "ANY Query_1_final results, especially multiple"
I have tried nesting JOIN and INNER JOIN's but I think there is something fundamental I am missing here about SQL. In PHP I would use an array compare or just run another query... any help would be much appreciated.
Just use IN. Here is the structure of the logic:
with q1 as (
<first query here>
)
SELECT m.*
FROM [SQLDB].[db].[all_meetings] m
WHERE meeting_started = 'TRUE' AND
meeting_party_id IN (SELECT user_id FROM q1);

Why is a group by clause required when rows are limited in where clause?

fiId is a primary key of Table1. Why does this query return as many rows as there are fiId in table1. The fiId is being limited to 1 row in the where clause. The query performs properly when a group by Table1.fiId is added, surely this should not be needed? Thanks.
SELECT
Table1.fiId,
SUM(CASE Table2.type IN (4,7) THEN Table2.valueToSum ELSE 0 END),
FROM
Table1 INNER JOIN Table3 ON Table1.fiId = Table3.parentId
INNER JOIN Table2 ON Table2.leId = Table3.fiId
WHERE
Table1.fiId = 76813 AND
Table2.insId = 431144
When using aggregate functions in your SELECT such as SUM and COUNT when selecting other columns as well, a GROUP BY including those additional columns is required. While I don't know the exact reason behind this, it definitely helps to put the results in context.
Consider the following query:
SELECT Name, Count(Product) as NumOrders
FROM CustomerOrders
GROUP BY Name
Here, we assume that we will get results like this:
Name NumOrders
------------------
Joe 15
Sally 5
Jim 23
Now, if SQL did not require the GROUP BY, then what would you expect the output to be? My best guess would be something like this:
Name NumOrders
------------------
Joe 43
Sally 43
Jim 43
In that case, while there may in fact be 43 order records in the table, including Name doesn't really provide any useful data. Instead, we just have a bunch of names out of context.
For more on this, see a similar question here: Why do I need to explicitly specify all columns in a SQL "GROUP BY" clause - why not "GROUP BY *"?

Query to ORDER BY the number of rows returned from another SELECT

I'm trying to wrap my head around SQL and I need some help figuring out how to do the following query in PostgreSQL 9.3.
I have a users table, and a friends table that lists user IDs and the user IDs of friends in multiple rows.
I would like to query the user table, and ORDER BY the number of mutual friends in common to a user ID.
So, the friends table would look like:
user_id | friend_user_id
1 | 4
1 | 5
2 | 10
3 | 7
And so on, so user 1 lists 4 and 5 as friends, and user 2 lists 10 as a friend, so I want to sort by the highest count of user 1 in friend_user_id for the result of user_id in the select.
The Postgres way to do this:
SELECT *
FROM users u
LEFT JOIN (
SELECT user_id, count(*) AS friends
FROM friends
) f USING (user_id)
ORDER BY f.friends DESC NULLS LAST, user_id -- as tiebreaker
The keyword AS is just noise for table aliases. But don't omit it from column aliases. The manual on "Omitting the AS Key Word":
In FROM items, both the standard and PostgreSQL allow AS to be omitted
before an alias that is an unreserved keyword. But this is impractical
for output column names, because of syntactic ambiguities.
Bold emphasis mine.
ISNULL() is a custom extension of MySQL or SQL Server. Postgres uses the SQL-standard function COALESCE(). But you don't need either here. Use the NULLS LAST clause instead, which is faster and cleaner. See:
PostgreSQL sort by datetime asc, null first?
Multiple users will have the same number of friends. These peers would be sorted arbitrarily. Repeated execution might yield different sort order, which is typically not desirable. Add more expressions to ORDER BY as tiebreaker. Ultimately, the primary key resolves any remaining ambiguity.
If the two tables share the same column name user_id (like they should) you can use the syntax shortcut USING in the join clause. Another standard SQL feature. Welcome side effect: user_id is only listed once in the output for SELECT *, as opposed to when joining with ON. Many clients wouldn't even accept duplicate column names in the output.
Something like this?
SELECT * FORM [users] u
LEFT JOIN (SELECT user_id, COUNT(*) friends FROM fields) f
ON u.user_id = f.user_id
ORDER BY ISNULL(f.friends,0) DESC

JOIN on another table after GROUP BY and COUNT

I'm trying to make sense of the right way to use JOIN, COUNT(*), and GROUP BY to do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BY that I shouldn't be.
(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)
I have two tables:
Table: Person
-------------
key name cityKey
1 Alice 1
2 Bob 2
3 Charles 2
4 David 1
Table: City
-------------
key name
1 Albany
2 Berkeley
3 Chico
I'd like to do a query on the People (with some WHERE clause) that returns
the number of matching people in each city
the key for the city
the name of the city.
If I do
SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person
LEFT JOIN City ON Person.cityKey = City.key
GROUP BY Person.cityKey, City.name
I get the result that I want
count cityKey cityName
2 1 Albany
2 2 Berkeley
However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.
I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.
select PC.ct, City.key, City.name
from City
join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
on City.key = PC.key
if my SQL isn't too rusty :-)
...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
You misunderstand, you got it backwards.
Standard SQL requires you to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER...
However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.
However, I've read that throwing in
that last part of the GROUP BY clause
(City.name) just to make it work is
wrong.
It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:
the required tables are joined
the composite dataset is filtered through the WHERE clause
the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
they are then filtered again, through the HAVING clause
finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.
Your query would only work on MySQL, because you group on Person.cityKey but select city.key. All other databases would require you to use an aggregate like min(city.key), or to add City.key to the group by clause.
Because the combination of city name and city key is unique, the following are equivalent:
select count(person.key), min(city.key), min(city.name)
...
group by person.citykey
Or:
select count(person.key), city.key, city.name
...
group by person.citykey, city.key, city.name
Or:
select count(person.key), city.key, max(city.name)
...
group by city.key
All rows in the group will have the same city name and key, so it doesn't matter if you use the max or min aggregate.
P.S. If you'd like to count only different persons, even if they have multiple rows, try:
count(DISTINCT person.key)
instead of
count(person.key)