How to query multiple tables with same named FK, using a paramater - sql

SqLite:
Let's say I have 3 tables, A,B & C, each with a column unique to it (colA, colB, colC (in fact, each has several columns, but I only want one column from each)) and a Foreign Key column with the same name (let's call it Idx).
Now, let's say that I want to SELECT A.colA, b.colB, c.colC WHERE idx=:idx
That is to say, I want to pass Idx as a parameter to the query.
That's my question: what is the query?

There are several ways, but I think the best way is to explicitly join the query and then check for the id in a where clause:
SELECT A.colA, b.colB, c.colC
FROM A join
B
on A.idx = B.idx join
C
on A.idx = C.idx
WHERE A.idx = :idx;
This uses inner join, under the assumption that the id is in all three tables.
Note that if there are multiple rows with the idx value in any of the tables, then you'll get multiple rows from the query.

Related

What's the purpose of a JOIN where no column from 2nd table is being used?

I am looking through some hive queries we are running as part of analytics on our hadoop cluster, but I am having trouble understanding one. This is the Hive QL query
SELECT
c_id, v_id, COUNT(DISTINCT(m_id)) AS participants,
cast(date_sub(current_date, ${window}) as string) as event_date
from (
select
a.c_id, a.v_id, a.user_id,
case
when c.id1 is not null and a.timestamp <= c.stitching_ts then c.id2 else a.m_id
end as m_id
from (
select * from first
where event_date <= cast(date_sub(current_date, ${window}) as string)
) a
join (
select * from second
) b on a.c_id = b.c_id
left join third c
on a.user_id = c.id1
) dx
group by c_id, v_id;
I have changed the names but otherwise this is the select statement being used to insert overwrite to another table.
Regarding the join
join (
select * from second
) b on a.c_id = b.c_id
b is not used anywhere except for join condition, so is this join serving any purpose at all?
Is it for making sure that this join only has entries where c_id is present in second table? Would a where IN condition be better if thats all this is doing.
Or I can just remove this join and it won't make any difference at all.
Thanks.
Join (any inner, left or right) can duplicate rows if join key in joined dataset is not unique. For example if a contains single row with c_id=1 and b contains two rows with c_id=1, the result will be two rows with a.c_id=1.
Join (inner) can filter rows if join key is absent in joined dataset. I believe this is what it meant to do.
If the goal is to get only rows with keys present in both datasets(filter) and you do not want duplication, and you do not use columns from joined dataset, then better use LEFT SEMI JOIN instead of JOIN, it will work as filter only even if there are duplicated keys in joined dataset:
left semi join (
select c_id from second
) b on a.c_id = b.c_id
This is much safer way to filter rows only which exist in both a and b and avoid unintended duplication.
You can replace join with WHERE IN/EXISTS, but it makes no difference, it is implemented as the same JOIN, check the EXPLAIN output and you will see the same query plan. Better use LEFT SEMI JOIN, it implements uncorrelated IN/EXISTS in efficient way.
If you prefer to move it to the WHERE:
WHERE a.c_id IN (select c_id from second)
or correlated EXISTS:
WHERE EXISTS (select 1 from second b where a.c_id=b.c_id)
But as I said, all of them are implemented internally using JOIN operator.

duplicate query result when join table

I face issue about duplicate data when join table, here my sample data table I have
-- Table A
I want to join with
-- Table B
this my query notation for join both table,
select a.trans_id, name
from tableA a
inner join tableB b
on a.ID_Trans = b.trans_id
and this the result, why I get the duplicating data which should show only two lines of data, please help me to solve this case.
Firstly, as you have been told multiple times in the comments, this is working exactly as you have written, and (more importantly) as intended. You have 2 rows in tableA and those 2 rows match 2 rows in your table tableB according to the ON clause. This means that each join operation, for the each of the rows in tableA, results in 2 rows as well; thus 4 rows (2 * 2 = 4).
Considering that your table, TableA only has one column then it seems that you should be cleaning up that data and deleting the duplicates. There are plenty of examples on how to do that already (example).
Perhaps the column you show us in TableA is one many, and thus instead you have a denormalisation issue, and instead there should be another table with the details of Id_trans and a PRIMARY KEY or UNIQUE CONSTRAINT/INDEX on it. Then you would join fron that table to TableB.
Finally, what you might be after is an EXISTS, which would look like this:
SELECT B.trans_id, B.[name]
FROM dbo.TableB B
WHERE EXISTS(SELECT 1
FROM dbo.TableA A
WHERE A.ID_Trans = B.trans_id); --Odd that it's called ID_Trans in one table, and Trans_ID in another
As the comments mentioned your query does exactly what you asked it to do but I think you wanted something like:
select a.trans_id, a.name, b.name
from tableA a
inner join tableB b on a.trans_id = b.trans_id
group by a.trans_id, a.name, b.name
Since there are two rows in both table with same ID join will make them four. You can use distinct to remove duplicates:
select distinct a.trans_id, name
from tableA a
inner join tableB b
on a.id_trans = b.trans_id
But I would suggest to use exists:
select trans_id, name
from tableB b
exists (select 1 from tableA a where a.trans_id=b.trans_id)

How to join tables based on certain condition? SQL

There are three tables A,B,C
Table A has columns [ID], [flag], [many other columns]
Table B has columns [ID], [column subset of Table A]
Table C has columns [ID], [same column subset as Table B (thus also a subset of Table A), however with different values]
I want to join Table A & Table B if Flag = '1', and want to join Table A & Table C if Flag ='2'
Could you help me how I might be able to achieve this?
Many thanks!
You're looking for a UNION.
SELECT
<interesting columns>
FROM
A
JOIN
B
ON A.ID = B.ID
AND A.Flag = 1
UNION ALL
SELECT
<exactly the same interesting columns>
FROM
A
JOIN
C
ON A.ID = C.ID
AND A.Flag = 2
If the flag is really a string column, put the single quotes back. If it's numeric, leave them out.
Since the flag field in A should effectively eliminate duplicates between the result sets, I opted for UNION ALL, which is more efficient than UNION because UNION will run a DISTINCT under the covers, which in this case is likely unnecessary.

Join SQL query to get data from two tables

I'm a newbie, just learning SQL and have this question: I have two tables with the same columns. Some registers are in the two tables but others only are in one of the tables. To illustrate, suppose table A = (1,2,3,4), table B=(3,4,5,6), numbers are registers. I need to select all registers in table B if they are not in table A, that is result=(5,6). What query should I use? Maybe a join. Thanks.
You can either use a NOT IN query like this:
SELECT col from A where col not in (select col from B)
or use an outer join:
select A.col
from A LEFT OUTER JOIN B on A.col=B.col
where B.col is NULL
The first is easier to understand, but the second is easier to use with more tables in the query.
Select register from TABLE_B b
Where not exists (Select register from TABLE_A a where a.register = b.register)
I assumed you have a column named register in TABLE_A and TABLE_B

In SQL, a Join is actually an Intersection? And it is also a linkage or a "Sideway Union"?

I always thought of a Join in SQL as some kind of linkage between two tables.
For example,
select e.name, d.name from employees e, departments d
where employees.deptID = departments.deptID
In this case, it is linking two tables, to show each employee with a department name instead of a department ID. And kind of like a "linkage" or "Union" sideway".
But, after learning about inner join vs outer join, it shows that a Join (Inner join) is actually an intersection.
For example, when one table has the ID 1, 2, 7, 8, while another table has the ID 7 and 8 only, the way we get the intersection is:
select * from t1, t2 where t1.ID = t2.ID
to get the two records of "7 and 8". So it is actually an intersection.
So we have the "Intersection" of 2 tables. Compare this with the "Union" operation on 2 tables. Can a Join be thought of as an "Intersection"? But what about the "linking" or "sideway union" aspect of it?
You're on the right track; the rows returned by an INNER JOIN are those that satisfy the join conditions. But this is like an intersection only because you're using equality in your join condition, applied to columns from each table.
Also be aware that INTERSECTION is already an SQL operation and it has another meaning -- and it's not the same as JOIN.
An SQL JOIN can produce a new type of row, which has all the columns from both joined tables. For example: col4, col5, and col6 don't exist in table A, but they do exist in the result of a join with table B:
SELECT a.col1, a.col2, a.col3, b.col4, b.col5, b.col6
FROM A INNER JOIN B ON a.col2=b.col5;
An SQL INTERSECTION returns rows that are common to two separate tables, which must already have the same columns.
SELECT col1, col2, col3 FROM A
INTERSECT
SELECT col1, col2, col3 FROM B;
This happens to produce the same result as the following join:
SELECT a.col1, a.col2, a.col3
FROM A INNER JOIN B ON a.col1=b.col1 AND a.col2=b.col2 AND a.col3=b.col3;
Not every brand of database supports the INTERSECTION operator.
A join 'links' or erm... joins the rows from two tables. I think that's what you mean by 'sideways union' although I personally think that is a terrible way to phrase it. But there are different types of joins that do slightly different things:
An inner join is indeed an intersection.
A full outer join is a union.
This page on Jeff Atwood's blog describes other possibilities.
An Outer Join - is not related to - Union or Union All.
For example, a 'null' would not occur as a result of Union or Union All operation, but it results from an Outer Join.
INNER JOIN treats two NULLs as two different values. So, if you join based on a nullable column, and if both tables have NULL values in that column, then INNER JOIN will ignore those rows.
Therefore, to correctly retrieve all common rows between two tables, INTERSECT should be used. INTERSECT treats two NULLs as the same value.
Example(SQLite):
Create two tables with nullable columns:
CREATE TABLE Table1 (id INT, firstName TEXT);
CREATE TABLE Table2 (id INT, firstName TEXT);
Insert NULL values:
INSERT INTO Table1 VALUES (1, NULL);
INSERT INTO Table2 VALUES (1, NULL);
Retrieve common rows using INNER JOIN (This shows no output):
SELECT * FROM Table1 INNER JOIN Table2 ON
Table1.id=Table2.id AND Table1.firstName=Table2.firstName;
Retrieve common rows using INTERSECT (This correctly shows the common row):
SELECT * FROM Table1 INTERSECT SELECT * FROM Table2;
Conclusion:
Even though, many times both INTERSECT and INNER JOIN can be used to get the same results, they are not the same and should be picked depending on the situation.