Is there a better way to write this query? - sql

So let's say I'm trying to get a list of all my users who belong to a certain user group. Easy:
SELECT *
FROM users, usergroups
WHERE usergroups.user_group_id = 1
NOW, let's say I want to get a list of all users who belong to a certain user group AND who have an email that ends in .edu.
SELECT *
FROM users, usergroups
WHERE usergroups.user_group_id = 1
AND users.email LIKE '%.edu'
That works great. Now let's say we want to get all of the above, plus users belonging to user group 2--but we don't care about the second group's email addresses. This query doesn't work:
SELECT *
FROM users, usergroups
WHERE usergroups.user_group_id IN (1,2)
AND users.email LIKE '%.edu'
Because it filters users from the second group. Right now I'm doing something like this:
SELECT *
FROM users as usrs, usergroups as groups1, usergroups as groups2
WHERE (groups1s.user_group_id = 1 AND users.email LIKE '%.edu')
OR groups2.user_group_id = 2
This gives me the results I want, but I hate the way it looks and works. Is there a cleaner way to do this?
EDIT
I didn't include joins on my last iteration up there. Here's what the query should really look like:
SELECT *
FROM users as usrs JOIN
usergroups as groups1 on usrs.group_id = groups1.group_id JOIN
usergroups as groups2 on usrs.group_id = groups2.group_id
WHERE (groups1.user_group_id = 1 AND users.email LIKE '%.edu')
OR groups2.user_group_id = 2

There is no need to select usergroups twice using different aliases. You could do simply:
SELECT *
FROM users as usrs, usergroups
WHERE (usergroups.user_group_id = 1 AND users.email LIKE '%.edu')
OR usergroups.user_group_id = 2
or, even better (using join):
SELECT *
FROM users as usrs
JOIN usergroups on usergroups.userid = users.id
WHERE (usergroups.user_group_id = 1 AND users.email LIKE '%.edu')
OR usergroups.user_group_id = 2

The way you are doing it may work, even though it looks uglier, because of SQL syntax. What doesn't make sense to me is why there is no join between users and usergroups on user id:
... where usergroups.user_id=users.user_id
Unless I am missing something, because you are doing a cross join between users and usergroups. It would help us a whole bunch, if you listed the columns in each of your tables.

I'll go out on a limb a bit and assume there is a relationship between users and usergroups. You'd then write your query like this:
SELECT *
FROM users as usrs
INNER JOIN usergroups as groups1
ON usrs.GroupID = groups1.GroupID
WHERE (groups1.user_group_id = 1 AND usrs.email LIKE '%.edu')
OR groups1.user_group_id = 2

Fix your JOINs.
You are always returning every row from users (ignore email filter for now) once for every row in usergroups because you have no JOIN, no matter what group they belong to. You have a simple cross join/cartesian product.
Then, use UNION or UNION ALL to remove the OR. Or leave the OR in place.
SELECT *
FROM
users as usrs
JOIN
usergroups as groups1 ON usrs.foo = groups1.foo
WHERE
groups1s.user_group_id = 1 AND users.email LIKE '%.edu'
UNION --maybe UNION ALL
SELECT *
FROM
users as usrs
JOIN
usergroups as group2 ON usrs.foo = groups2.foo
WHERE
groups2.user_group_id = 2

I don't see anything wrong with your latest edit with the joins in place. You could do a Union but I think that'd be uglier imo.

Going off of what Michael Goldshteyn said about re-writing it using JOINS, and Joe Stefnelli's comment about the cross join, your initial query, rewritten, would be:
SELECT *
FROM users
JOIN user_groups ON users.user_group_id = user_groups.user_group_id
WHERE users.email LIKE '%.edu'
AND user_groups.user_group_id = 1
Adding the second group would result in this:
SELECT *
FROM users AS users
JOIN user_groups AS user_groups ON users.user_group_id = user_groups_1.user_group_id
WHERE ( ( users.email LIKE '%.edu' AND user_groups_1.user_group_id = 1 )
OR user_groups_2.user_group_id = 2 )
Or you could even do a union (personally I wouldn't do this):
SELECT *
FROM users AS users_1
JOIN user_groups AS user_groups_1 ON users_1.user_group_id = user_groups_1.user_group_id
WHERE users_1.email LIKE '%.edu'
AND user_groups_1.user_group_id = 1
UNION
SELECT *
FROM users AS users_2
JOIN user_groups AS user_groups_2 ON users_2.user_group_id = user_groups_2.user_group_id
AND user_groups_2.user_group_id = 2

Optimizing the Query.
For large tables (maybe this is not the case) you should think on performance penalties your query might have. So I prefer the following approach: first I select into some temporary table the rows I'm going to work with, next I delete the rows I don't need, finally I select the result set and delete objects in memory. Note: This query uses Transact SQL.
select u.*, g.user_group_id into #TEMP from users u, usergroups g where u.group_id = g.group_id and g.user_group_id in (1,2)
delete from #TEMP where user_group_id = 1 and email not like '%.edu'
select * from #TEMP
drop table #TEMP

Related

Two tables get values with common value or no value in other table

We have two tables.
Table 1 - Users: Contains the users
Table 2 - Restrictions: User that can only access certain room. If a user is not in this table they can access all the rooms.
Now I need a query where I pass the room and it returns the users that have access. For example, I pass RoomId = 70 and my expected result is 1, 3.
1 because it actually has access to only room 70 and 3 because since it's not in the restriction table it has access to all rooms.
The problem is with an inner join I lose number 3, and with left join I keep number 2. So I cannot figure out how to relate the tables. Is there any way to do it directly with joins?
You can try with UNION
SELECT usr.UserID FROM Users usr
JOIN Restrictions res ON usr.UserID = res.UserID
WHERE res.RoomID = #myParameter --we take the users permitted for given room
UNION
SELECT UserID
FROM Users
WHERE UserID NOT IN (
SELECT DISTINCT UserID
FROM Restriction
) --plus the users that have permission in all rooms
I think, this should work. You don't really need a join to solve the issue.
select
userid
from
users u
where
userid is not in (
select
userid
from
restrictions
where
userid = u.userid
and roomid = #roomId
)
I strongly advise you not to use not in with a subquery. It is just a dangerous habit. If the subquery returns any NULL values, then the outer query returns no rows.
So, I would recommend:
select userid
from users u
where exists (select 1
from restrictions r
where r.userid = u.userid and
r.roomid = 70
) or
not exists (select 1
from restrictions r
where r.userid = u.userid
);
I would also strongly advise you to change your data model. If someone has access to one room and that row is deleted from restrictions, then they will have access to all rooms. That seems dangerous. You should explicitly list all rooms useres have access to.

Is it true that JOINS can be used everywhere to replace Subqueries in SQL

I heard people saying that table joins can be used everywhere to replace sub-queries. I tested it in my query, but found that appropriate data set was only retrieved when I used sub-queries. I was not able to get same data set using joins. I am not sure if what I found is right because I am a newcomer in RDBMS, thus not so much experienced. I will try to draw the schema (in words) of the database in which I was experimenting:
The database has two tables:
Users (ID, Name, City) and Friendship (ID, Friend_ID)
Goal: Users table is designed to store simple user data and Friendship table represents Friendship between users. Friendship table has both the columns as foreign keys, referencing to Users.ID. Tables have many-to-many relationship between them.
Question: I have to retrieve Users.ID and Users.Name of all the Users, which are not friends with a particular user x, but are from same city (much like fb's friend suggestion system).
By using subquery, I am able to achieve this. Query looks like:
SELECT ID, NAME
FROM USERS AS U
WHERE U.ID NOT IN (SELECT FRIENDS_ID
FROM FRIENDSHIP,
USERS
WHERE USERS.ID = FRIENDSHIP.ID AND USERS.ID = x)
AND U.ID != x AND CITY LIKE '% A_CITY%';
Example entries:
Users
Id = 1 Name = Jon City = Mumbai
Id=2 Name=Doe City=Mumbai
Id=3 Name=Arun City=Mumbai
Id=4 Name=Prakash City=Delhi
Friendship
Id= 1 Friends_Id = 2
Id = 2 Friends_Id=1
Id = 2 Friends_Id = 3
Id = 3 Friends_Id = 2
Can I get the same data set in a single query by performing joins. How? Please let me know if my question is not clear. Thanks.
Note: I used inner join in the sub-query by specifying both tables: Friendship, Users. Omitting the Users table and using the U from outside, gives an error (But if not using alias for the table Users, query becomes syntactically okay but result from this query includes ID's and names of users, who have more than one friends, including the user having ID x. Interesting, but is not the topic of the question).
For not in you can use left join and check for is null:
select u.id, u.name
from Users u
left join Friends f on u.id = f.id and f.friend_id = #person
where u.city like '%city%' and f.friend_id is null and u.id <> #person;
There are some cases where you can't work out your way with just inner/left/right joins, but your case is not one of them.
Please check sql fiddle: http://sqlfiddle.com/#!9/1c5b1/14
Also about your note: What you tried to do can be achieved with lateral join or cross apply depending on the engine you are using.
You can rewrite your query using only joins. The trick is to join to the User tables once with an inner join to identify users within the same city and reference the Friendship table with a left join and a null check to identify non-friends.
SELECT
U1.ID,
U1.Name
FROM
USERS U1
INNER JOIN
USERS U2
ON
U1.CITY = U2.CITY
LEFT JOIN
FRIENDSHIP F
ON
U2.ID = F.ID AND
U1.ID = F.FRIEND_ID
WHERE
U2.id = X AND
U1.ID <> U2.id AND
F.id IS NULL
The above query doesn't handle the situation where USER x's primary key is in the FRIEND_ID column of the FRIENDSHIP table. I assume because your subquery version doesn't handle that situation, perhaps you create 2 rows for each friendship, or friendships are not bi-directional.
Joins and subqueries can be used to achieve similar results in some cases, but certainly not all. As an example, this query with a subquery could not be achieve vis-a-vis a join:
SELECT ID, COLUMN1, COUNT(*) FROM MYTABLE
WHERE ID IN (
SELECT DISTINCT ID FROM MYTABLE
WHERE COLUMN2 NOT IN (VALUES1, VALUES2)
)
GROUP BY ID;
This is only one example, but there are many.
Conversely, you cannot get information from another table by using a subquery without joining it.
As to your example
SELECT ID, NAME FROM USERS AS U
WHERE U.ID NOT IN (
SELECT FRIENDS_ID FROM FRIENDSHIP, USERS
WHERE USERS.ID = FRIENDSHIP.ID AND USERS.ID = x)
AND U.ID != x AND CITY LIKE '% A_CITY%';
This could be constructed as:
select ID, NAME from users u
join FRIENDSHIP f on f.ID = u.ID
where u.ID = x
and u.ID != y
and CITY like '%A_CITY';
I changed your second x to a y assumptively, so it wouldn't cause confusion.
Of course, you may also want to LEFT JOIN aka LEFT OUTER JOIN if there is a chance that there may be multiple results in the FRIENDSHIP table.

Using binary logic in PostgreSQL JOIN queries

I've got 3 tables that look vaguely like this:
Users
----------
UserID
Name
Phone
User Groups
-----------
GroupID
Activity
Group Membership
---------------
UserID
GroupID
Independent Actives
-------------------
UserID
Activity
The idea is that a user can perform an activity either as part of a group or on their own. What I want to do is return all the people that partake in a certain activity. What I have been able to write so far lets me return all the users which are in groups that undertake that activity. What I want to add to this is the ability to see the people that do the activity independently. This is what I have so far:
SELECT
users.name, users.phone, user_groups.activity
FROM users
INNER JOIN group_membership ON group_membership.userID = users.userID
INNER JOIN user_groups ON user_groups.groupID = group_membership.groupID
WHERE user_groups.activity = 'Knitting';
The above bit works fine and it shows all of the users that are part of groups that do knitting, but I also want it to show all the users that are knitting independently. This is what I have attempted to add:
SELECT
users.name, users.phone, user_groups.activity
FROM users
INNER JOIN group_membership ON group_membership.userID = users.userID
INNER JOIN user_groups ON user_groups.groupID = group_membership.groupID
INNER JOIN independent_activity ON independent_activity.userID = users.userID
WHERE user_groups.activity = 'Knitting' OR independent_activity.activity = 'Knitting';
The problem here is the syntax, I understand the algorithm that I'm trying to do but I don't know how to transfer it into sql and so any help is appreciated.
You could use a UNION in this case
SELECT users.NAME
,users.phone
,user_groups.activity
FROM users
INNER JOIN group_membership ON group_membership.userID = users.userID
INNER JOIN user_groups ON user_groups.groupID = group_membership.groupID
WHERE user_groups.activity = 'Knitting'
UNION
SELECT users.NAME
,users.phone
,independent_activity.activity
FROM users
INNER JOIN independent_activity ON independent_activity.userID = users.userID
WHERE independent_activity.activity = 'Knitting';
You also might want to lookup the differences between a UNION and a UNION ALL and decide the one that suites your requirement.
You've got a working answer from SoulTrain. However, for completeness sake I'd like to mention that you don't have to join all those tables. (You could use outer joins here and remove duplicate matches with DISTINCT, but that's not necessary. You don't have to query the users table twice either. And you don't need UNION for doing the distinct job.)
Simply select from the one table you want to display data from, i.e. the users table, and then use EXISTS or IN to get only those users that are either in one set or another.
select name, phone
from users
where userid in
(
select userid
from independent_actives
where activity = 'Knitting'
)
or userid
(
select userid
from group_membership
where groupid in (select groupid from user_groups where activity = 'Knitting')
)

SQL IN query from multiple tables

SELECT DISTINCT addresses.email FROM addresses
WHERE addresses.user_id IN (SELECT user_group.id_user_groups FROM user_group
WHERE id_group_groups IN (SELECT news_group.groupid_newsg FROM news_group
WHERE newsid_news_good=1))
The above mentioned SQL query is not executing! It gets hanged until I stop the query. I have tried SQL operator "UNION" after first SELECT statement, but it displays all the email addresses which does not belong to a group. I want to select only those email addresses of the users who belong to "id_group_groups =5" (pls see the query below ) and are subscribed to "newsid_news_good=1".
The following query runs perfectly fine:
SELECT DISTINCT addresses.email FROM addresses
WHERE addresses.user_id IN (SELECT user_group.id_user_groups FROM user_group
WHERE id_group_groups =5 )
Does anybody have an idea what is the problem with the first query? Help will be strongly appreciated!
I think the sub selects complicate your problem. If I understand it right, it would be easier to solver your problem using joins instead of sub selects.
Try out something like this:
SELECT DISTINCT addresses.email
FROM addresses
JOIN user_group
ON user_group.id_user_groups = adresses.USER_ID
JOIN news_group
ON news_group.groupid_newsg = user_group.id_group_groups
WHERE newsid_news_good = 1
SELECT DISTINCT addresses.email FROM addresses
INNER JOIN (
SELECT user_group.id_user_groups FROM user_group
INNER JOIN news_group
ON news_group.groupid_newsg = id_group_groups
WHERE newsid_news_good=1
)
ON user_group.id_user_groups = addresses.user_id
You want to use joins. The subqueries you are using are most likely the cause of your performance woes.
SELECT DISTINCT a.email
FROM addresses a
INNER JOIN user_group u ON u.id_user_groups AND u.id_group_groups = 5
INNER JOIN news_group n ON n.groupid_newsg = u.id_group_groups AND n.newsid_news_good = 1
I'm going to guess that you are using MySQL, because it does a really poor job of executing subqueries in unions. The canonical way to fix this is to change them to exists with correlated subqueries:
SELECT DISTINCT a.email
FROM addresses a
where exists (select 1
from user_group ug
where ug.id_user_groups = a.user_id and
exists (select 1
from news_group ng
where ng.newsid_news_good = 1 and
ng.groupid_news = ug.id_group_groups
)
)
This solution works in all databases, of course; it is much more efficient in MySQL. Assuming email is not repeated in the addresses table, then you can drop the outer distinct.
The alternatives with join are also feasible. However they require the distinct.

DISTINCT or GROUP BY used with NOT IN returns wrong results

I have two tables
users with columns: uid, name, mail, etc
users_roles with columns: uid, rid
In users_roles, each time a role is added to a user, it is listed in the users_roles table. So, let's say user 1 has roles 1 and 4. In the table:
users_roles:
uid | rid
1 | 1
1 | 4
I need to return all users who don't have roles 4 OR 5. I have tried using both Group By and Distinct combined with NOT IN. The problem I keep running into is if a user has both roles 1 and 4, they will be returned in the results. Below is a an example of my Group By query:
SELECT *
FROM users AS u
LEFT JOIN users_roles AS ur ON u.uid = ur.uid
WHERE ur.rid NOT
IN ( 4, 5 )
GROUP BY ur.uid
I have tried sub-queries as well to no avail because the issue seems to be that Group By combines rows after finishing the query. So, it simply finds the record containing uid 1 rid 4 and returns it in the results.
The Drupal module Views that I can't use (due to security issues with Views Bulk Operations) accomplishes the desired results by doing the following:
LEFT JOIN users_roles ON users.uid = users_roles.uid
AND (users_roles.rid = '4' OR users_roles.rid = '5')
For long term maintenance I don't want to have to update the code every single time we add a role and this is going to make for one long query.
I looked at the following:
Aggregating Rows
Filtering distinct rows in SQL
While there are Drupal functions that will let me get the list of role ids where I could unset the roles I don't want show up in the resulting array, I feel like I am missing a fundamental understanding of SQL. Is there a better way to do this in SQL?
I need to return all users who don't have roles 4 & 5
select *
from users u
where not exists
(
select *
from users_roles ur
where ur.rid in (4,5)
and ur.uid = u.uid
)
If you want to check for no existance of both 4 and 5 (and not neccessarily one of them), you can use
select *
from users u
where not exists
(
select uid
from users_roles ur
where ur.rid in (4,5)
and ur.uid = u.uid
group by uid having count(distinct rid)=2
)
If the list is long you can use a mapping table with all possible values and use that in the above query
I would like that :
SELECT *
FROM users AS u
LEFT JOIN users_roles AS ur ON (u.uid = ur.uid AND ur.rid IN ( 4, 5 ) )
WHERE ur.rid IS NULL
GROUP BY u.uid