SQL : transform union query to a single query - sql

I have a database schema like this
User
id
matricule
Document
id
title
user_id(foreign key to user)
mode( can accept PUBLIC or PRIVATE)
I want to retrieve all document which are public and all documents which belongs to a given user(matricule)
I did a union query like this :
select * document d
Inner join user u ON u.id = d.user_id
and u.matricule ='matricule1'
UNION
select * from document d
Inner join user u ON u.id = d.user_id
where d.mode ='PUBLIC'
which works well but can i achieve the same result with another way( i read somewhere that union queries are bad for performance) like subquery for example ?
Thank you very much

select distinct *
from document d
Inner join user u ON u.id = d.user_id
where u.matricule = 'matricule1' or d.mode ='PUBLIC'
SELECT DISTINCT to remove duplicates just as UNION does. (Perhaps you want just SELECT?)

Assuming you just want the columns from the document table, this can also be written as:
select *
from document d
where exists (select *
from "user" u
where u.id = d.user_id
and u.matricule = 'matricule1')
or d.mode ='PUBLIC'
This makes removing duplicates unnecessary which UNION does implicitly and would be necessary for a JOIN solution.
But you have to check the execution plan for both solutions. In some cases the UNION solution might indeed be faster then the above (or a JOIN). This depends heavily on the DBMS being used (e.g. for Postgres or Oracle I wouldn't expect a big difference at all in this case)

Related

SQL Where on different table

SELECT * FROM student_mentor sm INNER JOIN users u
ON sm.student_id = u.user_id
WHERE sm.teacher_id = $teacher_id
Teacher_id being the session id,
I want to see all the students that have the same mentor.
Right now if I run this I just see all of the students twice, maybe one of you knows why?
My db scheme
You are not specifying on which columns you want to do the join, so you're getting a cross reference where all records are joined to all records.
You should do something like (not sure about your column names):
SELECT * FROM student_mentor sm INNER JOIN users u
ON sm.student_id = u.user_id
WHERE sm.teacher_id = $teacher_id

Is it true that JOINS can be used everywhere to replace Subqueries in SQL

I heard people saying that table joins can be used everywhere to replace sub-queries. I tested it in my query, but found that appropriate data set was only retrieved when I used sub-queries. I was not able to get same data set using joins. I am not sure if what I found is right because I am a newcomer in RDBMS, thus not so much experienced. I will try to draw the schema (in words) of the database in which I was experimenting:
The database has two tables:
Users (ID, Name, City) and Friendship (ID, Friend_ID)
Goal: Users table is designed to store simple user data and Friendship table represents Friendship between users. Friendship table has both the columns as foreign keys, referencing to Users.ID. Tables have many-to-many relationship between them.
Question: I have to retrieve Users.ID and Users.Name of all the Users, which are not friends with a particular user x, but are from same city (much like fb's friend suggestion system).
By using subquery, I am able to achieve this. Query looks like:
SELECT ID, NAME
FROM USERS AS U
WHERE U.ID NOT IN (SELECT FRIENDS_ID
FROM FRIENDSHIP,
USERS
WHERE USERS.ID = FRIENDSHIP.ID AND USERS.ID = x)
AND U.ID != x AND CITY LIKE '% A_CITY%';
Example entries:
Users
Id = 1 Name = Jon City = Mumbai
Id=2 Name=Doe City=Mumbai
Id=3 Name=Arun City=Mumbai
Id=4 Name=Prakash City=Delhi
Friendship
Id= 1 Friends_Id = 2
Id = 2 Friends_Id=1
Id = 2 Friends_Id = 3
Id = 3 Friends_Id = 2
Can I get the same data set in a single query by performing joins. How? Please let me know if my question is not clear. Thanks.
Note: I used inner join in the sub-query by specifying both tables: Friendship, Users. Omitting the Users table and using the U from outside, gives an error (But if not using alias for the table Users, query becomes syntactically okay but result from this query includes ID's and names of users, who have more than one friends, including the user having ID x. Interesting, but is not the topic of the question).
For not in you can use left join and check for is null:
select u.id, u.name
from Users u
left join Friends f on u.id = f.id and f.friend_id = #person
where u.city like '%city%' and f.friend_id is null and u.id <> #person;
There are some cases where you can't work out your way with just inner/left/right joins, but your case is not one of them.
Please check sql fiddle: http://sqlfiddle.com/#!9/1c5b1/14
Also about your note: What you tried to do can be achieved with lateral join or cross apply depending on the engine you are using.
You can rewrite your query using only joins. The trick is to join to the User tables once with an inner join to identify users within the same city and reference the Friendship table with a left join and a null check to identify non-friends.
SELECT
U1.ID,
U1.Name
FROM
USERS U1
INNER JOIN
USERS U2
ON
U1.CITY = U2.CITY
LEFT JOIN
FRIENDSHIP F
ON
U2.ID = F.ID AND
U1.ID = F.FRIEND_ID
WHERE
U2.id = X AND
U1.ID <> U2.id AND
F.id IS NULL
The above query doesn't handle the situation where USER x's primary key is in the FRIEND_ID column of the FRIENDSHIP table. I assume because your subquery version doesn't handle that situation, perhaps you create 2 rows for each friendship, or friendships are not bi-directional.
Joins and subqueries can be used to achieve similar results in some cases, but certainly not all. As an example, this query with a subquery could not be achieve vis-a-vis a join:
SELECT ID, COLUMN1, COUNT(*) FROM MYTABLE
WHERE ID IN (
SELECT DISTINCT ID FROM MYTABLE
WHERE COLUMN2 NOT IN (VALUES1, VALUES2)
)
GROUP BY ID;
This is only one example, but there are many.
Conversely, you cannot get information from another table by using a subquery without joining it.
As to your example
SELECT ID, NAME FROM USERS AS U
WHERE U.ID NOT IN (
SELECT FRIENDS_ID FROM FRIENDSHIP, USERS
WHERE USERS.ID = FRIENDSHIP.ID AND USERS.ID = x)
AND U.ID != x AND CITY LIKE '% A_CITY%';
This could be constructed as:
select ID, NAME from users u
join FRIENDSHIP f on f.ID = u.ID
where u.ID = x
and u.ID != y
and CITY like '%A_CITY';
I changed your second x to a y assumptively, so it wouldn't cause confusion.
Of course, you may also want to LEFT JOIN aka LEFT OUTER JOIN if there is a chance that there may be multiple results in the FRIENDSHIP table.

How can I get records from one table which do not exist in a related table?

I have this users table:
and this relationships table:
So each user is paired with another one in the relationships table.
Now I want to get a list of users which are not in the relationships table, in either of the two columns (user_id or pair_id).
How could I write that query?
First try:
SELECT users.id
FROM users
LEFT OUTER JOIN relationships
ON users.id = relationships.user_id
WHERE relationships.user_id IS NULL;
Output:
This is should display only 2 results: 5 and 6. The result 8 is not correct, as it already exists in relationships. Of course I'm aware that the query is not correct, how can I fix it?
I'm using PostgreSQL.
You need to compare to both values in the on statement:
SELECT u.id
FROM users u LEFT OUTER JOIN
relationships r
ON u.id = r.user_id or u.id = r.pair_id
WHERE r.user_id IS NULL;
In general, or in an on clause can be inefficient. I would recommend replacing this with two not exists statements:
SELECT u.id
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.user_id) AND
NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.pair_id);
I like the set operators
select id from users
except
select user_id from relationships
except
select pair_id from relationships
or
select id from users
except
(select user_id from relationships
union
select pair_id from relationships
)
This is a special case of:
Select rows which are not present in other table
I suppose this will be simplest and fastest:
SELECT u.id
FROM users u
WHERE NOT EXISTS (
SELECT 1
FROM relationships r
WHERE u.id IN (r.user_id, r.pair_id)
);
In Postgres, u.id IN (r.user_id, r.pair_id) is just short for:(u.id = r.user_id OR u.id = r.pair_id).
The expression is transformed that way internally, which can be observed from EXPLAIN ANALYZE.
To clear up speculations in the comments: Modern versions of Postgres are going to use matching indexes on user_id, and / or pair_id with this sort of query.
Something like:
select u.id
from users u
where u.id not in (select r.user_id from relationships r)
and u.id not in (select r.pair_id from relationships r)

Is there a better way to write this query?

So let's say I'm trying to get a list of all my users who belong to a certain user group. Easy:
SELECT *
FROM users, usergroups
WHERE usergroups.user_group_id = 1
NOW, let's say I want to get a list of all users who belong to a certain user group AND who have an email that ends in .edu.
SELECT *
FROM users, usergroups
WHERE usergroups.user_group_id = 1
AND users.email LIKE '%.edu'
That works great. Now let's say we want to get all of the above, plus users belonging to user group 2--but we don't care about the second group's email addresses. This query doesn't work:
SELECT *
FROM users, usergroups
WHERE usergroups.user_group_id IN (1,2)
AND users.email LIKE '%.edu'
Because it filters users from the second group. Right now I'm doing something like this:
SELECT *
FROM users as usrs, usergroups as groups1, usergroups as groups2
WHERE (groups1s.user_group_id = 1 AND users.email LIKE '%.edu')
OR groups2.user_group_id = 2
This gives me the results I want, but I hate the way it looks and works. Is there a cleaner way to do this?
EDIT
I didn't include joins on my last iteration up there. Here's what the query should really look like:
SELECT *
FROM users as usrs JOIN
usergroups as groups1 on usrs.group_id = groups1.group_id JOIN
usergroups as groups2 on usrs.group_id = groups2.group_id
WHERE (groups1.user_group_id = 1 AND users.email LIKE '%.edu')
OR groups2.user_group_id = 2
There is no need to select usergroups twice using different aliases. You could do simply:
SELECT *
FROM users as usrs, usergroups
WHERE (usergroups.user_group_id = 1 AND users.email LIKE '%.edu')
OR usergroups.user_group_id = 2
or, even better (using join):
SELECT *
FROM users as usrs
JOIN usergroups on usergroups.userid = users.id
WHERE (usergroups.user_group_id = 1 AND users.email LIKE '%.edu')
OR usergroups.user_group_id = 2
The way you are doing it may work, even though it looks uglier, because of SQL syntax. What doesn't make sense to me is why there is no join between users and usergroups on user id:
... where usergroups.user_id=users.user_id
Unless I am missing something, because you are doing a cross join between users and usergroups. It would help us a whole bunch, if you listed the columns in each of your tables.
I'll go out on a limb a bit and assume there is a relationship between users and usergroups. You'd then write your query like this:
SELECT *
FROM users as usrs
INNER JOIN usergroups as groups1
ON usrs.GroupID = groups1.GroupID
WHERE (groups1.user_group_id = 1 AND usrs.email LIKE '%.edu')
OR groups1.user_group_id = 2
Fix your JOINs.
You are always returning every row from users (ignore email filter for now) once for every row in usergroups because you have no JOIN, no matter what group they belong to. You have a simple cross join/cartesian product.
Then, use UNION or UNION ALL to remove the OR. Or leave the OR in place.
SELECT *
FROM
users as usrs
JOIN
usergroups as groups1 ON usrs.foo = groups1.foo
WHERE
groups1s.user_group_id = 1 AND users.email LIKE '%.edu'
UNION --maybe UNION ALL
SELECT *
FROM
users as usrs
JOIN
usergroups as group2 ON usrs.foo = groups2.foo
WHERE
groups2.user_group_id = 2
I don't see anything wrong with your latest edit with the joins in place. You could do a Union but I think that'd be uglier imo.
Going off of what Michael Goldshteyn said about re-writing it using JOINS, and Joe Stefnelli's comment about the cross join, your initial query, rewritten, would be:
SELECT *
FROM users
JOIN user_groups ON users.user_group_id = user_groups.user_group_id
WHERE users.email LIKE '%.edu'
AND user_groups.user_group_id = 1
Adding the second group would result in this:
SELECT *
FROM users AS users
JOIN user_groups AS user_groups ON users.user_group_id = user_groups_1.user_group_id
WHERE ( ( users.email LIKE '%.edu' AND user_groups_1.user_group_id = 1 )
OR user_groups_2.user_group_id = 2 )
Or you could even do a union (personally I wouldn't do this):
SELECT *
FROM users AS users_1
JOIN user_groups AS user_groups_1 ON users_1.user_group_id = user_groups_1.user_group_id
WHERE users_1.email LIKE '%.edu'
AND user_groups_1.user_group_id = 1
UNION
SELECT *
FROM users AS users_2
JOIN user_groups AS user_groups_2 ON users_2.user_group_id = user_groups_2.user_group_id
AND user_groups_2.user_group_id = 2
Optimizing the Query.
For large tables (maybe this is not the case) you should think on performance penalties your query might have. So I prefer the following approach: first I select into some temporary table the rows I'm going to work with, next I delete the rows I don't need, finally I select the result set and delete objects in memory. Note: This query uses Transact SQL.
select u.*, g.user_group_id into #TEMP from users u, usergroups g where u.group_id = g.group_id and g.user_group_id in (1,2)
delete from #TEMP where user_group_id = 1 and email not like '%.edu'
select * from #TEMP
drop table #TEMP

Selecting records in SQL based on another table's contents

I'm a bit new to SQL and have trouble constructing a select statement. I have two tables:
Table users
int id
varchar name
Table properties
int userID
int property
and I want all user records which have a certain property. Is there a way to get them in one SQL call or do I need to first get all userIDs from the properties table and then select each user individually?
Use a JOIN:
SELECT U.id, U.name, P.property FROM users U
INNER JOIN properties P ON P.userID = U.id
WHERE property = 3
If there's only one property row per user you want to select on, I think this is what you want:
select
users.*
from
users,
properties
where
users.id = properties.userID
and properties.property = (whatnot);
If you have multiple property rows matching "whatnot" and you only want one, depending your database system, you either want a left join or a distinct clause.
Check out the JOIN command. You could write a query like the following:
SELECT
name
FROM
users u
INNER JOIN properties p
ON u.id = p.userID
WHERE
p.property = <some value>
You're looking to JOIN tables.
Assuming the id and userID columns have the same meaning, it's like this:
select u.name
from users u inner join properties p
on u.id = p.userID
where p.property = :ValueToFind
SELECT [Name] FROM Users u
JOIN Properties p on p.UserID=u.ID
WHERE p.Property=1
Obviously it depends what flavour of RDBMS and TSQL you are using.