Let's say I have many tables with different structure that have a common column. How can I query for all rows from all these tables based on a condition.
Example:
table1:
column1 | column2 | user_id
table2:
columna | columnb | columnc | user_id
...
The condition would be user_id = <some number>. I don't want to query each table individually as there are about 30 tables. There may not be a record for each user_id in each table. What's the best option to do this?
Sounds as if you are looking for a full outer join
select *
from (
select *
from table1 t1
full outer join table2 t2 using (user_id)
full outer join table3 t2 using (user_id)
) t
where user_id = 42;
The using (user_id) syntax will make sure that the common column user_id is only present once in the result. So even though the query uses select *, there will only be a single user_id column in the result on which you can apply the where condition.
You can join the Tables on the user_id.
Inner JOIN would be good.
You have to set primarykeys and foreignkeys
If you have a relationship or a common column between the tables; like in your posted case user_id you can perform a simple JOIN like
select t1.*, t2.*
from table1 t1 join table2 t2 on t1.user_id = t2.user_id
where user_id = <some number>;
But if there is no relation exists (or) you can't join them then there is no other way than querying them individually.
I think in this very case outer join would server the purpose instead of inner Join as there may not be a record for each user_id in each table.
On other note I strongly feel that joining multiple tables(specially 30) would navigate to "locking" of tables for longer period of time and that can hamper your DB as well as your application. Restructuring of DB can be an option but in case you can't change it,make cluster of similar data which can have 4-5 table dat at once,mean total 6-7 queries, use multithreading from application end as each thread would snatch data for its respective query, club them together to create set of required information. This will enhance your application performance.
Related
Query
SELECT ID, Name, Phone
FROM Table1
LEFT JOIN Table2 ON Table1.ID = Table2.ID
WHERE Table2.ID IS NULL
Problem
Finding it hard to understand why someone would left join on an ID
and then set it to NULL in the where clause?
Am I missing something here? Is there any significance to this?
Could we just omit the Table2 altogether? As in not join at all?
Any help would be much appreciated.
The query you have in the question is basically equivalent to the following query:
SELECT ID, Name, Phone
FROM Table1
WHERE NOT EXISTS
(
SELECT 1
FROM Table2
WHERE Table1.ID = Table2.ID
)
Meaning it selects all the records in Table1 that does not have a correlated record in Table2.
The execution plan for both queries will most likely be the same (Personally, I've never seen a case when they produce a different execution plan, but I don't rule that out), so both queries should be equally efficient, and it's up to you to decide whether the left join or the exists syntax is more readable to you.
I think you should have an alias for you table and specify which table each column is coming from.
Assuming Name is from table one and Phone is form table two and ID is common in both, then the Left join mentioned above may help get all users that do not have phone numbers.
Table 1
Id Name
1 John Smith
2 Jane Doe
Table 2
Id Phone
2 071 555 0863
Left Join without the where clause
ID Name Phone
1 John Smith NULL
2 Jane Doe 071 555 0863
Left Join with the where clause
ID Name Phone
1 John Smith NULL
This is one of the ways to implement the relational database operation of antijoin, called anti semi join within sql server's terminology. This is essentially "bring rows from one table that are not in another table".
The ways I cant think of doing this are:
select cols from t1 left join t2 on t1.key=t2.key where t2.key is null
select cols from t1 where key not in (select key from t2)
select cols from t1 where not exists (select 1 from t2 where t1.key=t2.key)
and even
select * from t1 where key in (select key from t1 except select key from t2)
There are some differences between these methods (most notably, the danger of null handling in the case of not in), but they generally do the same.
To address your points:
Finding it hard to understand why someone would left join on an ID and
then set it to NULL in the where clause?
As mentioned, in order to exclude results from t1 that are present in t2
Could we just omit the Table2 altogether? As in not join at all?
If you don't use the join (or any of its equivelant alternatives), you will get more results, as the rows in table1 that have the same id with any rows in table2 will be returned, too.
If joining condition column is having null value specifically ID then it is bad database design per my understanding.
As per your query below. Here are the possible scnario why where clause make sense
I am assuming that your name and phone number are coming from table2 and then you are trying to find the name and phone number whose ID is null.
If name and phone number is coming from table1 and table 2 is just having ID join and not selecting anything from table 2 then where clause is total waste.
SELECT
ID,
Name,
Phone
FROM
Table1
LEFT JOIN
Table2
ON
Table1.ID = Table2.ID
WHERE
Table2.ID IS NULL
Essentially in the above common business scenario, developers put where clause filter criteria in left join when any value is coming from right side is having non relevance data and not required to be the part of dataset then filter it out.
I have two tables,
in table1 I have 5 rows and
in table2 3 rows
table1:
#no---Name---value
1-----John---100
2-----Cooper-200
3-----Mil----300
4-----Key----200
5-----Van----300
Table 2:
#MemID-#no---FavID
19-----1-----2
21-----1-----3
22-----2-----5
Now expected result:
#no---name---value---MyFav
1-----John---100-----NULL
2-----Cooper-200-----1
3-----Mil----300-----1
4-----Key----200-----NULL
5-----Van----300-----NULL
1 indicates - My favorites
MyFav - new column ( alias)
This is the expected result, please suggest how to get it.
I think I understand the logic. You want MyFav to be marked as a 1 if that row is a favorite of John. You can do this with a left join and some more filtering:
select t1.*,
(case when t2.#no is not null then 1 end) as MyFav
from table1 t1 left join
table2 t2
on t1.#no = t2.FavId and
t2.#no = (select tt1.#no from table1 tt1 where tt1.Name = 'John');
Just use natural join for that, It will use your primary key as a mediator to join both the tables, as required. In your case, I think primary key is #no
For more information on natural join please visit SQL Joins
I am searching for a real scenario problem that I faced last night while joining two tables with foreign keys. Actually I want to get all values from second table on behalf of foreign key.
Here are my two tables let suppose:
table1 (id_user_history(PK),id_user(FK), order_no, p_quantity)
table2 (id_shoping_cart(PK), id_user(FK),order_id, prod_quantity)
Now I want to get all values from table2 by joining these tables with table1(id_user(Fk)) and table2( id_user(FK))
SELECT *
FROM table2 t2
LEFT JOIN
table1 t1
on t1.id_user = t2.id_user
all records from table 2 and only those record which match on table 1.
SQL is mainly set logic. Here's a link which helps visualize.
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
Looks like a simple join fits the bill:
select *
from table1 t1
left join
table2 t2
on t1.id_user = t2.id_user
These are two tables below-
CREATE EXTERNAL TABLE IF NOT EXISTS Table1 (This is the MAIN table through which comparisons need to be made)
(
ITEM_ID BIGINT,
CREATED_TIME STRING,
BUYER_ID BIGINT
)
CREATE EXTERNAL TABLE IF NOT EXISTS Table2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
)
As BUYER_ID and USER_ID they both are same thing.
I need to find the total COUNT and all those BUYER_ID that are not there in Table2 by comparing from Table1. So I think it's a kind of Left Outer Join Query. I am new to HiveSql stuff so I am having problem to figure out what should be the actual syntax to do this in HiveQL. I wrote the below SQL Query. Can anyone tell me whether the SQL query below is fine or not to achieve my scenario?
SELECT COUNT(BUYER_ID), BUYER_ID
FROM Table1 dw
LEFT OUTER JOIN Table2 dps ON (dw.BUYER_ID = dps.USER_ID)
GROUP BY BUYER_ID;
If I understand your requirements correctly, I think you are almost there. It seems you only need to add a condition checking if there's no match between the two tables:
SELECT COUNT(BUYER_ID), BUYER_ID
FROM Table1 dw
LEFT OUTER JOIN Table2 dps ON (dw.BUYER_ID = dps.USER_ID)
WHERE dps.USER_ID IS NULL
GROUP BY BUYER_ID;
The above will filter out BUYER_IDs that do have matches in Table2, and will show the remaining BUYER_IDs and their corresponding count values. (Well, that's what I understand you want.)
I've got a scenario where I need to do a join across three tables.
table #1 is a list of users
table #2 contains users who have trait A
table #3 contains users who have trait B
If I want to find all the users who have trait A or trait B (in one simple sql) I think I'm stuck.
If I do a regular join, the people who don't have trait A won't show up in the result set to see if they have trait B (and vice versa).
But if I do an outer join from table 1 to tables 2 and 3, I get all the rows in table 1 regardless of the rest of my where clause specifying a requirement against tables 2 or 3.
Before you come up with multiple sqls and temp tables and whatnot, this program is far more complex, this is just the simple case. It dynamically creates the sql based on lots of external factors, so I'm trying to make it work in one sql.
I expect there are combinations of in or exists that will work, but I was hoping for some thing simple.
But basically the outer join will always yield all results from table 1, yes?
SELECT *
FROM table1
LEFT OUTER
JOIN table2
ON ...
LEFT OUTER
JOIN table3
ON ...
WHERE NOT (table2.pk IS NULL AND table3.pk IS NULL)
or if you want to be sneaky:
WHERE COALESCE(table2.pk, table3.pk) IS NOT NULL
but for you case, i simply suggest:
SELECT *
FROM table1
WHERE table1.pk IN (SELECT fk FROM table2)
OR table1.pk IN (SELECT fk FROM table3)
or the possibly more efficient:
SELECT *
FROM table1
WHERE table1.pk IN (SELECT fk FROM table2 UNION (SELECT fk FROM table3)
If you really just want the list of users that have one trait or the other, then:
SELECT userid FROM users
WHERE userid IN (SELECT userid FROM trait_a UNION SELECT userid FROM trait_b)
Regarding outerjoin specifically, longneck's answer looks like what I was in the midst of writing.
I think you could do a UNION here.
May I suggest:
SELECT columnList FROM Table1 WHERE UserID IN (SELECT UserID FROM Table2)
UNION
SELECT columnList FROM Table1 WHERE UserID IN (SELECT UserID FROM Table3)
Would something like this work? Keep in mind depending on the size of the tables left outer joins can be very expensive with regards to performance.
Select *
from table1
where userid in (Select t.userid
From table1 t
left outer join table2 t2 on t1.userid=t2.userid and t2.AttributeA is not null
left outer join table3 t3 on t1.userid=t3.userid and t3.AttributeB is not null
group by t.userid)
If all you want is the ids of the users then
SELECT UserId From Table2
UNION
SELECT UserId From Table3
is totally sufficient.
If you want some more infos from Table1 on these users, you can join the upper SQL to Table 1:
SELECT <list of columns from Table1>
FROM Table1 Join (
SELECT UserId From Table2
UNION
SELECT UserId From Table3) User on Table1.UserID = Users.UserID