How multiple join to the same table with other table? - sql

I am not sure of the question title!, but i have this problem
table1
id | from | to
1 A B
2 C A
3 B A
table2
id | table1_id
1 1
2 1
3 1
4 3
5 3
6 2
7 2
8 2
I need to get data from table1 considering rows with ids (2,3) as one row and joined with table2
fetch the last id between them which is 5
the result
id | from | to | table2_id
3 B | A | 5
2 C | A | 8

As you have not mentioned database, I have written SQL in SQL Server
WITH TAB_FLAT AS
(
SELECT TAB_1.table1_id , COALESCE ( TAB_2.table2_id , TAB_1.table2_id ) max_id , TAB_1.table2_id FROM
( select table1_id , max(id) table2_id from table2 group by table1_id ) TAB_1
LEFT JOIN
( select table1_id , max(id) table2_id from table2 group by table1_id ) TAB_2
ON TAB_1.table2_id = TAB_2.table1_id
)
select TAB_FLAT.table1_id AS ID, COALESCE(C.frm ,COALESCE(B.frm,A.frm) ) AS 'FROM' ,
COALESCE(C.to2 ,COALESCE(B.to2,A.to2) ) AS 'TO' ,
TAB_FLAT.max_id AS table2_id
FROM TAB_FLAT
LEFT JOIN table1 A ON A.id = TAB_FLAT.table1_id
LEFT JOIN table1 B ON B.id = TAB_FLAT.table2_id
LEFT JOIN table1 C ON C.id = TAB_FLAT.max_id
WHERE TAB_FLAT.table1_id IN (1,2)
Demo --> https://rextester.com/DCHUN74655
Explanation:
SELECT TAB_1.table1_id , COALESCE ( TAB_2.table2_id , TAB_1.table2_id ) max_id , TAB_1.table2_id FROM
( select table1_id , max(id) table2_id from table2 group by table1_id ) TAB_1
LEFT JOIN
( select table1_id , max(id) table2_id from table2 group by table1_id ) TAB_2
ON TAB_1.table2_id = TAB_2.table1_id
We are doing a self left join to get the largest table 2 ID for each table 1 ID
AND corresponding intermediate level.
select TAB_FLAT.table1_id AS ID, COALESCE(C.frm ,COALESCE(B.frm,A.frm) ) AS 'FROM' ,
COALESCE(C.to2 ,COALESCE(B.to2,A.to2) ) AS 'TO' ,
TAB_FLAT.max_id AS table2_id
FROM TAB_FLAT
LEFT JOIN table1 A ON A.id = TAB_FLAT.table1_id
LEFT JOIN table1 B ON B.id = TAB_FLAT.table2_id
LEFT JOIN table1 C ON C.id = TAB_FLAT.max_id
We are joining to table 1 based on table1 id , intermediate id and largest id
WHERE TAB_FLAT.table1_id IN (1,2)
Filtering out records with ID = 1 or 2

First, write a query to get each row's max ID in table 2.
select t1.*, max(t2.id) as table2_id
from table2 t2
join table1 t1 on t2.table1_id = t1.id
group by t1.id
Then use that as a CTE for two queries. One to get all the rows except 1 and 3. And one to get 1 or 3, whichever has a greater table2_id. union them together.
with max_table2 as (
select t1.*, max(t2.id) as table2_id
from table2 t2
join table1 t1 on t2.table1_id = t1.id
group by t1.id
)
select *
from max_table2
where id not in (1,3)
union
(
select *
from max_table2
where id in (1,3)
order by table2_id desc
limit 1
)
If the combined 1/3 row must have the ID of 1, despite 3 having the larger table2_id, this can be done by hard coding the ID in the select query.
with max_table2 as (
select t1.*, max(t2.id) as table2_id
from table2 t2
join table1 t1 on t2.table1_id = t1.id
group by t1.id
)
select *
from max_table2
where id not in (1,3)
union
(
select 1 as id, "from", "to", table2_id
from max_table2
where id in (1,3)
order by table2_id desc
limit 1
)
Try it.
I suspect rather than hard coding rows 1 and 3, you're actually counting them as equivalent because they have the same path, just reversed. We can make this query more generic.
First, normalize the from/to so they're in the same order. While we're at it, also get their max table2 id.
select
t1.id,
case when "from" < "to" then "from" else "to" end as "from",
case when "from" < "to" then "to" else "from" end as "to",
max(t2.id) as max_table2_id
from table2 t2
join table1 t1 on t2.table1_id = t1.id
group by t1.id
id from to max_table2_id
2 A C 8
3 A B 5
1 A B 3
Then rank the paths with the same from/to.
with normalized_max_table2 as (
select
t1.id,
case when "from" < "to" then "from" else "to" end as "from",
case when "from" < "to" then "to" else "from" end as "to",
max(t2.id) as table2_id
from table2 t2
join table1 t1 on t2.table1_id = t1.id
group by t1.id
)
select *,
rank() over (partition by "from", "to" order by table2_id desc) as "rank"
from normalized_max_table2
id from to table2_id rank
3 A B 5 1
1 A B 3 2
2 A C 8 1
And, finally, select only the first ranks.
with normalized_max_table2 as (
select
t1.id,
case when "from" < "to" then "from" else "to" end as "from",
case when "from" < "to" then "to" else "from" end as "to",
max(t2.id) as table2_id
from table2 t2
join table1 t1 on t2.table1_id = t1.id
group by t1.id
),
ranked_max_table2 as (
select *,
rank() over (partition by "from", "to" order by table2_id desc) as "rank"
from normalized_max_table2
)
select id, "from", "to", table2_id
from ranked_max_table2
where "rank" = 1
id from to table2_id
3 A B 5
2 A C 8
That's a long-hand way of doing it. There may be a more compact way.
Try it.

Related

How to find the same elements in tow arrays of two different tables in HIVE?

I have two tables like the follows:
table1:
id sid
1 | ['101', '102', '103']
2 | ['102', '101', '103']
3 | ['103', '101', '102']
table2:
id | sid
1 | ['101', '102', '103']
3 | ['102', '103']
and I wish to get the following table:
id sid
1 | ['101', '102', '103']
2 | ['102', '101', '103']
3 | ['103', '102']
Explanation: I wish to select the same elements in table1.sid and table2.sid with the same order in table1. Besides, if the id in table1 doesn't exist in table2, then keep the sid as it is in table1. What should I do?
You can use posexplode() to basically do what you want. Of course, all this array stuff is more complex in Hive than in other databases, particularly getting the results in the order you want:
select t1.id, collect_list(sid2)
from (select t1.id, t2.sid2, t1.pos1
from (table1 t1 lateral view
posexplode(t1.sid) as pos1, sid1
) left join
(table2 t2 lateral view
posexplode(t2.sid) as pos2, sid2
)
on t2.id = t1.id and t2.sid2 = t1.sid1
distribute by t1.id
order by t1.id, t1.pos1
) t
Something like this:
with t as (
select t1.id, collect_list(sid2) as sid
from (select t1.id, t2.sid2, t1.pos1
from (table1 t1 lateral view
posexplode(t1.sid) as pos1, sid1
) left join
(table2 t2 lateral view
posexplode(t2.sid) as pos2, sid2
)
on t2.id = t1.id and t2.sid2 = t1.sid1
distribute by t1.id
order by t1.id, t1.pos1
) t
select t1.id,
(case when size(t.sid) = 0 then t1.sid else t.sid end)
from t1 left join
t
on t1.id = t.id

Order by in subquery and alias

I have a problem with a order by in oracle query.
select KEY, B, C, (select D from TABLE1 a where a.KEY = b.KEY and a.DATE<
b.DATE order BY a.DATE and rownum =1 )
FROMSTATUS from TABLE2 b
I known the "order by" is not working in subquery. I modify my query as:
select KEY, B, C, (select * from (select D from TABLE1 a where a.KEY =
b.KEY and a.DATE< b.DATE order by DATE) where rownum = 1)
FROMSTATUS from TABLE2 b
But in this way the B.KEY and B.DATE has not resolved by oracle
I need select only a 1 value from TABLE2 and the value is the first previous a.DATE
Example:
TABLE1
KEY DATE A B C
1 01/31/2000 1 2 3
2 02/25/2000 X Y Z
TABLE2
KEY DATE D
1 01/30/2000 1
1 01/27/2000 2
1 01/25/2000 2
2 02/20/2000 4
2 02/13/2000 1
I need this result:
TABLE1.KEY TABLE1.DATE TABLE1.A TABLE1.B TABLE1.C TABLE2.DATE TABLE2.D
1 01/31/2000 1 2 3 01/30/2000 1
2 02/25/2000 X Y Z 02/20/2000 4
Can you help me?
(i am sorry for my bad english)
row_number() after union will get your output.
select tFinal.DATE, tFinal.KEY
from (select row_number() over (partition by KEY order by t1.T, t1.DATE desc) as rn, t1.DATE, t1.KEY
from
(select DATE, KEY, 't1' as T from TABLE1
union all
select DATE, KEY, 't2' as T from TABLE2) t1) tFinal
Where rn = 2
You can use window functions for this:
WITH cte AS (
SELECT TABLE2.KEY, TABLE2.B, TABLE2.C, TABLE1.D
, ROW_NUMBER() OVER (PARTITION BY TABLE2.KEY, TABLE2.DATE ORDER BY TABLE1.DATE DESC) AS rn
FROM TABLE2
LEFT JOIN TABLE1 ON TABLE2.KEY = TABLE1.KEY AND TABLE2.DATE > TABLE1.DATE
)
SELECT *
FROM cte
WHERE rn = 1
Here's an answer that uses aggregation:
WITH t1 AS (SELECT 1 KEY, to_date('31/01/2000', 'dd/mm/yyyy') dt FROM dual UNION ALL
SELECT 2 KEY, to_date('25/02/2000', 'dd/mm/yyyy') dt FROM dual),
t2 AS (SELECT 1 KEY, to_date('30/01/2000', 'dd/mm/yyyy') dt FROM dual UNION ALL
SELECT 1 KEY, to_date('27/01/2000', 'dd/mm/yyyy') dt FROM dual UNION ALL
SELECT 1 KEY, to_date('25/01/2000', 'dd/mm/yyyy') dt FROM dual UNION ALL
SELECT 2 KEY, to_date('20/02/2000', 'dd/mm/yyyy') dt FROM dual UNION ALL
SELECT 2 KEY, to_date('13/02/2000', 'dd/mm/yyyy') dt FROM dual)
SELECT t1.KEY,
t1.dt t1_date,
MAX(t2.dt) t2_date
FROM t1
LEFT OUTER JOIN t2 ON t1.key = t2.key AND t2.dt < t1.dt
GROUP BY t1.key, t1.dt
ORDER BY t1.key;
KEY T1_DATE T2_DATE
---------- ----------- -----------
1 31/01/2000 30/01/2000
2 25/02/2000 20/02/2000
I'm assuming here that t1.key is a unique column. Whether this is more performant than any of the other answers for your data is up to you to test *{:-)
In Oracle you can use KEEP LAST for this:
select
key,
b,
c,
(
select max(d) keep (dense_rank last order by t2.date)
from table2 t2
where t2.key = t1.key and t2.date < t1.date
) as fromstatus
from table1 t1;
As of Oracle 12c you can also use FETCH FIRST ROW:
select
key,
b,
c,
(
select d
from table2 t2
where t2.key = t1.key and t2.date < t1.date
order by t2.date desc
fetch first row only
) as fromstatus
from table1 t1;
or, moving the subquery to the FROM clause:
select
t1.key,
t1.b,
t1.c,
first_t2.d as fromstatus
from table1 t1
outer apply
(
select d
from table2 t2
where t2.key = t1.key and t2.date < t1.date
order by t2.date desc
fetch first row only
) first_t2;
This last query has the advantage that you could easily select more values from the table2 row than just one.

require to form a sql query

I was working on preparing a query where I was stuck.
Consider tables below:
table1
id key col1
-- --- -----
1 1 abc
2 2 d
3 3 s
4 4 xyz
table2
id col1 foreignkey
-- ---- ----------
1 12 1
2 13 1
3 14 1
4 12 2
5 13 2
Now what I need is to select only those records from table1 for which the corresponding entries in table2 does not have say col1 value as 12.
So the challenge is after applying join even though it will skip for value 1 corresponding to col1 equal to 12 it still has another multiple rows whose values are say 13, 14 for which also they have same foreignkey. Now what I want is if there is a single row having value 12 then it should not pick that id at all from table1.
How can I form a query with this?
The output which i need is say from above table structure i want to get those records from table1 for which col1 value from table2 does not have value as 14.
so my query should return me only row 2 from table1 and not row 1.
Another way of doing that. The first two queries are just for making the sample data.
;WITH t1(id ,[key] ,col1) AS
(
SELECT 1 , 1 , 'abc' UNION ALL
SELECT 2 , 2 , 'd' UNION ALL
SELECT 3 , 3 , 's' UNION ALL
SELECT 4 , 4 , 'xyz'
)
,t2(id ,col1, foreignkey) AS
(
SELECT 1 , 12 , 1 UNION ALL
SELECT 2 , 13 , 1 UNION ALL
SELECT 3 , 14 , 1 UNION ALL
SELECT 4 ,12 , 2 UNION ALL
SELECT 5 ,13 , 2
)
SELECT id, [key], col1
FROM t1
WHERE id NOT IN (SELECT t2.Id
FROM t2
INNER JOIN t1 ON t1.Id = t2.foreignkey
WHERE t2.col1 = 14)
This is a typical case for NOT EXISTS:
SELECT id, [key], col1
FROM table1 t1
WHERE NOT EXISTS (SELECT 1
FROM table2 t2
WHERE t2.foreignkey = t1.id AND t2.col1 = 14)
The above query will not select a row from table1 if there is a single correlated row in table2 having col1 = 14.
Output:
id key col1
-------------
2 2 d
3 3 s
4 4 xyz
If you want to return records that, in addition to the criterion set above, also have correlated records in table2, then you can use the following query:
SELECT t1.id, MAX(t1.[key]) AS [key], MAX(t1.col1) AS col1
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.foreignkey
GROUP BY t1.id
HAVING COUNT(CASE WHEN t2.col1 = 14 THEN 1 END) = 0
Output:
id key col1
-------------
2 2 d
You can also achieve the same result with the second query using a combination of EXISTS and NOT EXISTS:
SELECT id, [key], col1
FROM table1 t1
WHERE EXISTS (SELECT 1
FROM table2 t2
WHERE t2.foreignkey = t1.id)
AND
NOT EXISTS (SELECT 1
FROM table2 t3
WHERE t3.foreignkey = t1.id AND t3.col1 = 14)
select t1.id,t1.key,
(select ROW_NUMBER() OVER(PARTITION BY col1 ORDER BY col1 DESC) AS Row,* into
#Temp from table1)
from table1 t1
inner join table2 t2 on t1.id=t2.foreignkey
where t2.col1=(select col1 from #temp where row>1)

sql - how to select multiple columns with only one distinct column from joining multiple tables

I am using SQL Server. I want to select multiple columns with only one distinct column.
For example,
TABLE 1:
ID NAME ...(other columns)
1 A
2 B
3 C
Table 2 (ID and number together is the unique key):
ID Number Year...(other columns)
1 111 2011
2 12345678 2011
2 22222222 2012
3 333 2013
Table 3:
Name Company ...(other columns)
A Amazon
B Google
C Amazon
Each table above has many columns (more than 2). How can get the result so that there are only 5 columns as result without other "useless" columns and the ID column is the distinct column.
More specifically, for example,
The normal sql statement I had is the following:
select distinct ID, NAME, NUMBER, COMPANY, Year
from table1
left join table2 on table1.ID = table2.ID
left join table3 on table1.name = table3.name
group by ID, NAME, NUMBER, COMPANY, year
order by ID desc, Year desc
This will output the following:
ID NAME NUMBER COMPANY YEAR
1 A 111 Amazon 2011
2 B 12345678 google 2011
2 B 22222222 google 2012
3 c 333 Amazon 2013
What I want to have is actually the following:
ID NAME NUMBER COMPANY YEAR
1 A 111 Amazon 2011
2 B 22222222 google 2012
3 c 333 Amazon 2013
I want to have the results without duplicated ID. If there are duplicate ID's, I want to show only the latest one. In above example, ID 2 has 2 rows in table2. I want to show the one with the latest date which is 2012.
How can I achieve this. Thanks in advance.
You can use not exists to only select the latest rows per id (where another row with the same id and a greater year does not exist).
select * from table1 t1
where not exists (
select 1 from table1 t2
where t2.id = t1.id
and t2.year > t1.year
)
using analytic functions (this should be faster than the query above)
select * from
(select *,
row_number() over(partition by id order by year desc) rn
from table1) t1 where rn = 1
edit: applied to your tables
select t2.id, t3.name, t2.number, t3.company, t2.year from
(
select * from
(select *,
row_number() over(partition by id order by year desc) rn
from table2
) t1 where rn = 1
) t2 join table1 t1 on t2.id = t1.id
join table3 t3 on t3.name = t1.name
WITH CTE AS
(
SELECT t1.ID, t1.NAME, t2.NUMBER, t3.COMPANY, t2.Year,
Row_number() OVER(partition BY t1.ID, t1.NAME, t2.NUMBER, t3.COMPANY ORDER BY t2.Year DESC) AS rn
FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID = t2.ID
LEFT JOIN table3 t3 ON t1.name = t3.name
)
SELECT ID, NAME, NUMBER, COMPANY, Year
FROM CTE
WHERE rownum = 1
ORDER BY ID desc, Year desc
I used a subquery, note subqueries are inefficient.
select distinct t1.ID, t1.NAME, t2.NUMBER, t3.COMPANY, t2.Year
from table1 t1
left join table2 t2 on t1.ID = t2.ID
inner join table3 t3 on t1.name = t3.name --inner join to select the latest record only
and t2.Year = (Select MAX(year) from table2 t22
where t22.ID = t2.Id group by ID)
group by t1.ID, t1.NAME, t2.NUMBER, t3.COMPANY, t2.year
order by t1.ID, t2.Year desc
EDIT: using a more efficient CTE
WITH CTE as
(
Select Id, MAX(year) as [yr] from table2 t2 group by ID
)
select distinct t1.ID, t1.NAME, t2.NUMBER, t3.COMPANY, t2.Year
from table1 t1
left join table2 t2 on t1.ID = t2.ID
left join table3 t3 on t1.name = t3.name
inner join CTE on cte.yr = t2.Year
and t2.Id = CTE.Id
group by t1.ID, t1.NAME, t2.NUMBER, t3.COMPANY, t2.year
order by t1.ID, t2.Year desc

Best practices for multi table join query

Tables structure are below :
Table1 (ID int, value1 int,...)
ID Value1
---- --------
1 10
2 20
5 12
Table2 (ID int, value2 int,...)
ID Value2
---- --------
1 13
3 24
4 11
Table3 (ID int, value3 int,...)
ID Value3
---- --------
4 150
5 100
My expected output is below.
ID Value1 Value2 Value3
---- -------- -------- --------
1 10 13 NULL
2 20 NULL NULL
3 NULL 24 NULL
4 NULL 11 150
5 12 NULL 100
It should be noted that above tables is huge and I want to have best performance.
My query suggestion is below :
Select ID,
SUM(Value1) AS Value1,
SUM(Value2) AS Value2,
SUM(Value3) AS Value3
From (
Select ID, Value1 , NULL as value2, NULL as value 3
From Table1
Union ALL
Select ID, NULL , value2, NULL
From Table2
Union ALL
Select ID, NULL, NULL, value 3
From Table3
)Z
Group By Z.ID
Assuming you only have one value per id, this should do the trick:
SELECT aux.ID, t1.Value1, t2.Value2, t3.Value3
FROM
(SELECT ID FROM Table1
UNION
select ID FROM Table2
UNION
SELECT ID FROM Table3) aux
LEFT OUTER JOIN Table1 t1 ON aux.ID = t1.ID
LEFT OUTER JOIN Table2 t2 ON aux.ID = t2.ID
LEFT OUTER JOIN Table3 t3 ON aux.ID = t3.ID
If you've more than one value:
SELECT aux.ID, SUM(t1.Value1) as 'Value1', SUM(t2.Value2) as 'Value2', SUM(t3.Value3) as 'Value3'
FROM
(SELECT ID FROM Table1
UNION
select ID FROM Table2
UNION
SELECT ID FROM Table3) aux
LEFT OUTER JOIN Table1 t1 ON aux.ID = t1.ID
LEFT OUTER JOIN Table2 t2 ON aux.ID = t2.ID
LEFT OUTER JOIN Table3 t3 ON aux.ID = t3.ID
GROUP BY aux.ID
I intially wrote the same answer as aF. did above. So, removed it, and used a different approach.
Here,
1st query get all from table1
2nd query gets all from table2 skipping
those already present in table1 3rd query gets all remaining skipping those in above two query.
SELECT T1.ID, T1.VALUE1, T2.VALUE2, T3.VALUE3 --all T1
FROM TABLE1 T1
LEFT JOIN TABLE2 ON T1.ID=T2.ID
LEFT JOIN TABLE3 ON T1.ID=T3.ID
UNION
SELECT T2.ID, T1.VALUE1, T2.VALUE2, T3.VALUE3 --all T2 where T1 is NULL
FROM TABLE1 T2
LEFT JOIN TABLE1 ON T2.ID=T1.ID
LEFT JOIN TABLE3 ON T2.ID=T3.ID
WHERE T1.ID IS NULL
UNION
SELECT T3.ID, T1.VALUE1, T2.VALUE2, T3.VALUE3 --all T3 where T1 is NULL AND T2 IS NUL
FROM TABLE1 T3
LEFT JOIN TABLE1 ON T3.ID=T1.ID
LEFT JOIN TABLE2 ON T3.ID=T2.ID
WHERE T1.ID IS NULL
AND T2.ID IS NULL