How to find distinct users in multiple tables - sql

I have a table called users that holds users ids, as well as a few tables like cloud_storage_a, cloud_storage_b and cloud_storage_c. If a user exists in cloud_storage_a, that means they are a connected to cloud storage a. A user can exist in many cloud storages too. Here's an example:
users table:
user_id | address | name
-------------------------------
123 | 23 Oak Ave | Melissa
333 | 18 Robson Rd | Steve
421 | 95 Ottawa St | Helen
555 | 12 Highland | Amit
192 | 39 Anchor Rd | Oliver
cloud_storage_a:
user_id
-------
421
333
cloud_storage_b:
user_id
-------
555
cloud_storage_c:
user_id
-------
192
555
Etc.
I want to create a query that grabs all users connected on any cloud storage. So for this example, users 421, 333, 555, 192 should be returned. I'm guessing this is some sort of join but I'm not sure which one.

You are close. Instead of a JOIN that merges tables next to each other based on a key, you want to use a UNION which stacks recordsets/tables on top of eachother.
SELECT user_id FROM cloud_storage_a
UNION
SELECT user_id FROM cloud_storage_b
UNION
SELECT user_id FROM cloud_storage_c
Using keyword UNION here will give you distinct user_id's across all three tables. If you switched that to UNION ALL you would no longer get Distinct, which has it's advantages in other situations (not here, obviously).
Edited to add:
If you wanted to bring in user address you could use this thing as a subquery and join into your user table:
SELECT
subunion.user_id
user.address
FROM
user
INNER JOIN
(
SELECT user_id FROM cloud_storage_a
UNION
SELECT user_id FROM cloud_storage_b
UNION
SELECT user_id FROM cloud_storage_c
) subunion ON
user.user_id = subunion.user_id
That union will need to grow as you add more cloud_storage_N tables. All in all, it's not a great database design. You would be much better off creating a single cloud_storage table and having a field that delineates which one it is a, b, c, ... ,N
Then your UNION query would just be SELECT DISTINCT user_id FROM cloud_storage; and you would never need to edit it again.

You need to join unknown(?) number of tables cloud_storage_X this way.
You'd better change your schema to the following:
storage:
user_id cloud
------- -----
421 a
333 a
555 b
192 c
555 c
Then the query is as simple as this:
select distinct user_id
from storage;

select u.* from users u,
cloud_storage_a csa,
cloud_storage_b csb,
cloud_storage_c csc
where u.user_id = csa.user_id or u.user_id = csb.user_id or u.user_id = csc.user_id
You should simplify your schema to handle this type of queries.

To get columns from your users table for all (distinct) qualifying users:
SELECT * -- or whatever you need
FROM users u
WHERE EXISTS (SELECT 1 FROM cloud_storage_a WHERE user_id = u.user_id) OR
EXISTS (SELECT 1 FROM cloud_storage_b WHERE user_id = u.user_id) OR
EXISTS (SELECT 1 FROM cloud_storage_c WHERE user_id = u.user_id);
To just get all user_id and nothing else, #JNevill's UNION query looks good. You could join the result of this to users to the same effect:
SELECT u.* -- or whatever you need
FROM users u
JOIN (
SELECT user_id FROM cloud_storage_a
UNION
SELECT user_id FROM cloud_storage_b
UNION
SELECT user_id FROM cloud_storage_c
) c USING user_id);
But that's probably slower.

Related

Display user name and their city who have booked their tickets not by using HDFC bank for any of the bookings

Display user name and their city who have booked their tickets not by using HDFC bank for any of the bookings. Sort the result based on the user name.
This is the Schema
This is my code.
select distinct u.name,u.address
from users u join bookingdetails b on b.user_id=u.user_id
where lower(b.name) !='hdfc'
order by u.name;
I am getting the expected output but passing only one of the test cases (can't see the test cases as it only showing pass or fail). I think this query can be written in a more effective way.
Data for the users table.
NAME USER_ID ADDRESS
------------------- ---------- ---------
Jaya 6 Chennai
Krena 5 Mumbai
Johan 4 Delhi
Ivan 3 Chennai
Tom 2 Hyderabad
John 1 Bangalore
Data for the bookingdetails table.
BD_ID ACC_NO NAME USER_ID
------- ------ ---- ---------
1001 1234 SBI 1
1002 5623 KVB 5
1003 9876 ICICI 4
1004 9193 HDFC 2
1005 8397 HDFC 3
1006 1234 SBI 1
expected output:
NAME ADDRESS
---------------- ----------
Johan Delhi
John Bangalore
Krena Mumbai
also, this is the output(given above) of your edited query and it is also passing only one test case.
select distinct u.name, u.address from users u
join bookingdetails b on b.user_id = u.user_id
where u.user_id not in
( select user_id from bookingdetails where name='HDFC')
order by u.name;
This will hopefully pass both of your test cases.
Since you need only the columns of one table (users), you may convert your query into NOT EXISTS
select u.name,u.address
from users u where not exists
( select 1 from bookingdetails b
where b.user_id=u.user_id
and lower(b.name) ='hdfc'
) order by u.name;
Also, adding lower() may degrade the performance if the data volume is high and may not use an index if it exists on name. So, in a real time scenario( unlike the assignment which you are currently trying to complete), either store names in a single case( either lower or upper) or use a function based index on lower(name).
EDIT : If you want to exclude the users which aren't present in the bookingdetails, you may use your original query or an exists condition.
select u.name,u.address
from users u where exists
( select 1 from bookingdetails b
where b.user_id=u.user_id
and lower(b.name) != 'hdfc'
) order by u.name;
DEMO
select distinct u.name, u.address from users u join bookingdetails b
on b.user_id = u.user_id
where u.user_id not in
(select user_id from bookingdetails where name = 'HDFC')
order by u.name;
This gives the correct answer as provided above and both test cases will sure be passed

SQL recursive CTE returns more results than expected

Using SQL I'm trying to get a list of ids of a user's descendants. So, say I have:
Id | ManagerId | Name
---|-----------|------
4 | NULL | James Smith
7 | 4 | John Doe (1)
8 | 7 | John Doe (2)
9 | 8 | John Doe (3)
10 | 8 | John Doe (4)
And I want to get back 4, 7, 8, 9, 10. Looking around online the recommended approach is to create a recursive CTE, which I've done as:
WITH UsersCTE AS (
SELECT Id,
ManagerId
FROM Users
WHERE IsActive = 1
UNION ALL
SELECT A.Id,
UsersCTE.Id
FROM UsersCTE
INNER JOIN Users AS A
ON A.ManagerId = UsersCTE.ManagerId
)
SELECT *
FROM UsersCTE;
Which kind of works. It starts out by getting the ids that I want, and then just goes off and starts getting ids in all kind of manners. I'll confess that I have no idea how it works, but it's not giving me the result I want. Instead it's giving me this:
How can I get it to give me just the ids I want. For reference, I need to get this list so I can then query back the db and get a list of orders for the current users and any descendants they manage. I am using EF6 for the data access and was planning on making this CTE into a view and querying it appropriately, but I'm open to better recommendations.
Rewrite as follows:
WITH UsersCTE AS (
SELECT Id,
ManagerId
FROM Users
WHERE IsActive = 1 and ManagerID is null
UNION ALL
SELECT A.Id,
UsersCTE.Id
FROM UsersCTE
INNER JOIN Users AS A
ON A.ManagerId = UsersCTE.Id and A.IsActive = 1
)
SELECT *
FROM UsersCTE;
The null filter in the first part anchors your CTE i.e. gives it a base case of employees with no managers. The other change is in the join condition of the second part. The updated condition matches manager ID to employee ID.
Demo

Selecting Records Matching Two or More Related Tables

I have a 'persons' table:
person_id name
100 jack
125 jill
201 jane
And many sub-tables, that the person_id could be in:
'rowing'
id person_id
1 100
2 201
'swimming'
id person_id
1 125
2 201
'running'
id person_id
1 201
'throwing'
id person_id
1 125
2 201
I would like to be able to select all people who are involved in two activities, regardless of which two.
As the great #TimSchmelter (great first name) mentioned, you should really be having a single PersonActivities table with an id corresponding to the particular activity.
That being said, if you must work with your current schema, one option would be to UNION together the activity tables, and then count which persons have two or more records, meaning that they participated in two or more activities.
SELECT t1.person_id, t1.name
FROM persons
INNER JOIN
(
SELECT t.person_id, COUNT(t.person_id) AS activityCount
FROM
(
SELECT person_id FROM rowing
UNION ALL
SELECT person_id FROM swimming
UNION ALL
SELECT person_id FROM running
UNION ALL
SELECT person_id FROM throwing
) AS t
GROUP BY t.person_id
HAVING COUNT(t.person_id) > 1
) t2
ON t1.person_id = t2.person_id

Select Distinct with MAX() using SQL Server

I am trying to get all distinct account numbers from 3 tables in SQL Server, but it seems that my way doesn't work, any suggestions?
SELECT distinct account, max(date_added) as date_added FROM table_one group by account
union
SELECT distinct account, max(date_added) as date_added, FROM table_two group by account
union
SELECT distinct account, max(date_added) as date_added, FROM table_three group by account order by account asc
1
SELECT account, MAX(date_added) AS date_added
FROM table_one
GROUP BY account
UNION
SELECT account, MAX(date_added) AS date_added
FROM table_two
GROUP BY account
UNION
SELECT account, MAX(date_added) AS date_added
FROM table_three
GROUP BY account
ORDER BY account ASC
2
SELECT account, MAX(date_added) AS date_added
FROM (
SELECT account, date_added
FROM table_one
UNION ALL
SELECT account, date_added
FROM table_two
UNION ALL
SELECT account, date_added
FROM table_three
) t
GROUP BY account
ORDER BY account ASC
Generate a set of data combining the results then get the max date for each account. This generates an inline view from which we can get a distinct account and max date added.
SELECT account, max(date_Added) as Date_Added from (
SELECT account date_Added FROM table_one
union
SELECT account, date_added FROM table_two
union
SELECT account, date_added FROM table_three) B
Group by account
If you want a list with all accounts and each account only appears once, then you could do something like:
select distinct account from table_one
union
select distinct account from table_two
union
select distinct account from table_three
By using union you will not get duplicate rows, and since you are only selecting account you will get each account only once.
It is however unclear what you are doing with the date_added column.
select account, MAX(date_added)
FROM (
SELECT account, date_added FROM table_one
union
SELECT account, date_added FROM table_two
union
SELECT account, date_added FROM table_three
) X
group by account
order by account asc
So what this query does, it unions all the data from all of the three tables in one recordset, so you can work with it as it is one table. Imagine that you have 3 accounts, 1, 2 and 3, and there is following data for them spread accross those 3 tables:
Table One
Account | Date Added
--------+-----------
1 | 01-01-2015
2 | 01-02-2015
Table Two
Account | Date Added
--------+-----------
3 | 01-03-2015
1 | 01-02-2015
Table Three
Account | Date Added
--------+-----------
2 | 01-04-2015
3 | 01-05-2015
So after union all of those records out of all the three tables we get following 'table' (actually it is recordset stored in memory):
Union Data
Account | Date Added
--------+-----------
1 | 01-01-2015
2 | 01-02-2015
3 | 01-03-2015
1 | 01-02-2015
2 | 01-04-2015
3 | 01-05-2015
Than we just select account and latest date_added for each of the distinct account found within this set, so we get following results:
Result Data
Account | Date Added
--------+-----------
1 | 01-02-2015
2 | 01-04-2015
3 | 01-05-2015
Feel free to ask any other questions you get related to this.

MySql Join a View Table as a Boolean

I have a users table, and a view table which lists some user ids... They look something like this:
Users:
User_ID | Name | Age | ...
555 John Doe 35
556 Jane Doe 24
557 John Smith 18
View_Table
User_ID
555
557
Now, when I do run a query to retrieve a user:
SELECT User_ID,Name,Age FROM Users WHERE User_ID = 555
SELECT User_ID,Name,Age FROM Users WHERE User_ID = 556
I also would like to select a boolean, stating whether or not the user I'm retrieving is present in the View_Table.
Result:
User_ID Name Age In_View
555 John Doe 35 1
556 Jane Doe 24 0
Any help would be greatly appreciated. Efficiency is a huge plus. Thanks!!
SELECT Users.User_ID,Name,Age, View_Table.User_ID IS NOT NULL AS In_View
FROM Users
LEFT JOIN View_table USING (User_ID)
WHERE User_ID = 555
SELECT
User_ID, Name, Age,
CASE WHEN v.UserID is not null THEN 1 ELSE 0 END AS In_View
FROM Users u
LEFT JOIN View_Table v on u.User_ID = v.UserID
WHERE UserID ...;
I would do a LEFT JOIN. So long as you have key/index for User_ID, it should be very efficient.
SELECT User_ID,Name,Age, IF(View_Table.User_ID, 1, 0) AS In_View
FROM Users LEFT JOIN View_Table USING(User_ID)
WHERE User_ID = 555
I know this is an "Old" question but just happened upon this and none of these answers seemed to be that good. So I thought I would throw in my 2 cents
SELECT
u.User_ID,
u.Name,
u.Age,
COALESCE((SELECT 1 FROM View_Table AS v WHERE v.User_ID = u.User_ID ), 0) AS In_View
FROM
Users AS u
WHERE
u.User_ID = 555
Simply select 1 with a correlated query ( or null ) then to get the 0 we can use the handy function COALESCE which returns the first non-null value left to right.