SQL query at least one of something - sql

I have a bunch of Users, each of whom has many Posts.
Schema:
Users: id
Posts: user_id, rating
How do I find all Users who have at least one post with a rating above, say, 10?
I'm not sure if I should use a subQuery for this, or if there's an easier way.
Thanks!

To find all users with at least one post with a rating above 10, use:
SELECT u.*
FROM USERS u
WHERE EXISTS(SELECT NULL
FROM POSTS p
WHERE p.user_id = u.id
AND p.rating > 10)
EXISTS doesn't care about the SELECT statement within it - you could replace NULL with 1/0, which should result in a math error for dividing by zero... But it won't, because EXISTS is only concerned with the filteration in the WHERE clause.
The correlation (the WHERE p.user_id = u.id) is why this is called a correlated subquery, and will only return rows from the USERS table where the id values match, in addition to the rating comparison.
EXISTS is also faster, depending on the situation, because it returns true as soon as the criteria is met - duplicates don't matter.

You can join the tables to find the relevant users, and use DISTINCT so each user is in the result set at most once even if they have multiple posts with rating > 10:
select distinct u.id,u.username
from users u inner join posts p on u.id = p.user_id
where p.rating > 10

Use an inner join:
SELECT * from users INNER JOIN posts p on users.id = p.user_id where p.rating > 10;

select distinct id
from users, posts
where id = user_id and rating > 10

SELECT max(p.rating), u.id
from users u
INNER JOIN posts p on users.id = p.user_id
where p.rating > 10
group by u.id;
Additionally, this will tell you what their highest rating is.

The correct answer for your question as stated is OMG Ponies's answer, WHERE EXISTS is more descriptive and almost always faster. But "SELECT NULL" looks really ugly and counterintuitive to me. I've seen SELECT * or SELECT 1 as a best practice for this.
Another way, in case we're collecting answers:
SELECT u.id
FROM users u
JOIN posts p on u.id = p.user_id
WHERE p.rating > 10
GROUP BY u.id
HAVING COUNT(*) > 1
This could be useful if it's not always 1 you're testing on.

Related

How to print two attribute values from your Sub query table

Suppose I have two tables,
User
Post
Posts are made by Users (i.e. the Post Table will have foreign key of user)
Now my question is,
Print the details of all the users who have more than 10 posts
To solve this, I can type the following query and it would give me the desired result,
SELECT * from USER where user_id in (SELECT user_id from POST group by user_id having count(user_id) > 10)
The problem occurs when I also want to print the Count of the Posts along with the user details. Now obtaining the count of user is not possible from USER table. That can only be done from POST table. But, I can't get two values from my subquery, i.e. I can't do the following,
SELECT * from USER where user_id in (SELECT user_id, **count(user_id)** from POST group by user_id having count(user_id) > 10)
So, how do I resolve this issue? One solution I know is this, but this I think it would be a very naive way to resolve this and will make the query much more complex and also much more slow,
SELECT u.*, (SELECT po.count(user_id) from POST as po group by user_id having po.count(user_id) > 10) from USER u where u.user_id in (SELECT p.user_id from POST p group by user_id having p.count(user_id) > 10)
Is there any other way to solve this using subqueries?
Move the aggregation to the from clause:
SELECT u.*, p.num_posts
FROM user u JOIN
(SELECT p.user_id, COUNT(*) as num_posts
FROM post p
GROUP BY p.user_id
HAVING COUNT(*) > 10
) p
ON u.user_id = p.user_id;
You can do this with subqueries:
select u.*
from (select u.*,
(select count(*) from post p where p.user_id = u.user_id) as num_posts
from users u
) u
where num_posts > 10;
With an index on post(user_id), this might actually have better performance than the version using JOIN/GROUP BY.
You can try by joining the tables, Prefer to do a JOIN than using SUBQUERY
SELECT user.*, count( post.user_id ) as postcount
FROM user LEFT JOIN post ON users.user_id = post.user_id
GROUP BY post.user_id
HAVING postcount > 10 ;

How to count and group by column across one to many relationship while handling 0 case?

I am trying to formulate a single SQL query that will count a table across a one to many relationship. Here is the short version of my schema:
User(id)
Group(id)
UserGroup(user_id, group_id)
Post(id, user_id, group_id)
The goal is to return the count of posts for each user in a group. The specific issue I am running into is my current query cannot return 0 for a user that has no posts. Here is my naive query:
SELECT
COUNT(*) as total,
user_id
FROM
posts
WHERE
group_id = ?
GROUP BY user_id
ORDER BY
total DESC
This works fine when every user has a post, but when some have no posts, they do not show up in the list. How can I write a single query that handles this scenario and returns count 0 for said users? I know I need to somehow incorporate UserGroup to get the list of users, but am stuck from there.
Use a left join:
SELECT u.id, COUNT(*) as total
FROM users u LEFT JOIN
posts p
ON p.user_id = u.id AND
p.group_id = ?
GROUP BY u.id
ORDER BY total DESC
I think I got it, but not sure how performant.
select count(p), u.id from users u left join (select * from workouts where group_id = ?) p on p.user_id = u.id where u.id in (select user_id from user_group where group_id = ?) group by u.id;

How can I get records from one table which do not exist in a related table?

I have this users table:
and this relationships table:
So each user is paired with another one in the relationships table.
Now I want to get a list of users which are not in the relationships table, in either of the two columns (user_id or pair_id).
How could I write that query?
First try:
SELECT users.id
FROM users
LEFT OUTER JOIN relationships
ON users.id = relationships.user_id
WHERE relationships.user_id IS NULL;
Output:
This is should display only 2 results: 5 and 6. The result 8 is not correct, as it already exists in relationships. Of course I'm aware that the query is not correct, how can I fix it?
I'm using PostgreSQL.
You need to compare to both values in the on statement:
SELECT u.id
FROM users u LEFT OUTER JOIN
relationships r
ON u.id = r.user_id or u.id = r.pair_id
WHERE r.user_id IS NULL;
In general, or in an on clause can be inefficient. I would recommend replacing this with two not exists statements:
SELECT u.id
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.user_id) AND
NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.pair_id);
I like the set operators
select id from users
except
select user_id from relationships
except
select pair_id from relationships
or
select id from users
except
(select user_id from relationships
union
select pair_id from relationships
)
This is a special case of:
Select rows which are not present in other table
I suppose this will be simplest and fastest:
SELECT u.id
FROM users u
WHERE NOT EXISTS (
SELECT 1
FROM relationships r
WHERE u.id IN (r.user_id, r.pair_id)
);
In Postgres, u.id IN (r.user_id, r.pair_id) is just short for:(u.id = r.user_id OR u.id = r.pair_id).
The expression is transformed that way internally, which can be observed from EXPLAIN ANALYZE.
To clear up speculations in the comments: Modern versions of Postgres are going to use matching indexes on user_id, and / or pair_id with this sort of query.
Something like:
select u.id
from users u
where u.id not in (select r.user_id from relationships r)
and u.id not in (select r.pair_id from relationships r)

How to create alias from all columns in sql?

The goal of the query here was simplified, but it represents a complex one that I want to select all users fields from the subquery plus computing a SUM. So, this is an example only.
I'm doing a subquery because of a problem with SUM duplicate rows. Like recommended to do with this answer: https://stackoverflow.com/a/7351991/255932
But the problem is that subquery also selects a column "rating" from the table ratings and I can't select all users fields unless describing all users columns on parent select.
SELECT id, name, x, y, z ..., SUM(rating)
FROM
(SELECT users.*, ratings.rating
FROM users
INNER JOIN ratings ON
users.id = ratings.user_id
)
GROUP BY users.id
I would like to know if there is a way to replace (id, name, x, y, z, ...) with a simple (users.*).
Actually, there are two very simple ways.
If users.id is the primary key:
SELECT u.*, sum(r.rating) AS total
FROM users u
JOIN ratings r ON r.user_id = u.id
GROUP BY u.id;
You need Postgres 9.1 or later for this to work. Details in this closely reated answer:
PostgreSQL - GROUP BY clause
If users.id is at least unique:
SELECT u.*, r.total
FROM users u
JOIN (
SELECT user_id, sum(rating) AS total
FROM ratings
GROUP BY 1
) r ON r.user_id = u.id;
Works with any version I know of. When retrieving the whole table or large parts of it, it's also generally faster to group first and join later.
Kind of, but not really. There is a workaround, but you have to approach your subquery differently.
SELECT (c.users).*, SUM(c.rating)
FROM
(SELECT users, ratings.rating
FROM users
INNER JOIN ratings ON
users.id = ratings.user_id
) c
GROUP BY c.users;

sql using count & group by without using distinct keyword?

I want to optimize this query
**SELECT * FROM Users WHERE Active = 1 AND UserId IN (SELECT UserId FROM Users_Roles WHERE RoleId IN (SELECT RoleId FROM Roles WHERE PermissionLevel >= 100)) ORDER BY LastName**
execution time became less wen i replace above query with joins as below,
**SELECT u.* FROM Users u INNER JOIN Users_Roles ur ON (u.UserId = ur.UserId) INNER JOIN Roles r ON (r.RoleId = ur.RoleId) WHERE u.Active = 1 AND r.PermissionLevel > 100 GROUP BY u.UserId ORDER BY u.LastName**
But the above query gives duplicate records since my roles table has more than one entry for every user.
I cant use DISTINCT since there is a function where i find count by replacing SELECT(*) FROM to SELECT COUNT(*) FROM to find count for pagination and then execute count query and result query
As we already known that count & GROUP BY is used together will result in bad output.
Now i want to optimize the query and have to find number of rows ie count for the query. Please give be better way find out the result.
It is difficult to optimise other peoples queries without fully knowing the schema, what is indexed what isn't, how much data there is, what your DBMS is etc. Even with this we can't see execution plans, IO statistics etc. With this in mind, the below may not be better than what you already have, but it is how I would write the query in your situation.
SELECT u.*
FROM Users u
INNER JOIN
( SELECT ur.UserID
FROM Users_Roles ur
INNER JOIN Roles r
ON r.RoleID = ur.RoleID
WHERE r.PermissionLevel > 100
GROUP BY ur.UserID
) ur
ON u.UserId = ur.UserId
WHERE u.Active = 1
ORDER BY u.LastName