Chaining joins in SQL based on dynamic table - sql

The title may not be accurate for the question but here goes! I have the following table:
id1 id2 status
1 2 a
2 3 b
3 4 c
6 7 d
7 8 e
8 9 f
9 10 g
I would like to get the first id1 and last status based on a dynamic chain joining, meaning that the result table will be:
id final_status
1 c
6 g
Logically, I want to construct the following arrays based on joining the table to itself:
id1 chained_ids chained_status
1 [2,3,4] [a,b,c]
6 [7,8,9,10] [d,e,f,g]
Then grab the last element of the chained_status list.
Since if we were to keep joining this table to itself on id1 = id2 we would eventually have single rows with these results. The problem is that the number of joins is not constant (a single id may be chained many or few times). There is always a 1 to 1 mapping of id1 to id2.
Thanks in advanced! This can be done in either T-SQL or Hive (if someone has a clever map-reduce solution).

You can do this with a recursive CTE:
;WITH My_CTE AS
(
SELECT
id1,
id2,
status,
1 AS lvl
FROM
My_Table T1
WHERE
NOT EXISTS
(
SELECT *
FROM My_Table T2
WHERE T2.id2 = T1.id1
)
UNION ALL
SELECT
CTE.id1,
T3.id2,
T3.status,
CTE.lvl + 1
FROM
My_CTE CTE
INNER JOIN My_Table T3 ON T3.id1 = CTE.id2
)
SELECT
CTE.id1,
CTE.status
FROM
My_CTE CTE
INNER JOIN (SELECT id1, MAX(lvl) AS max_lvl FROM My_CTE GROUP BY id1) M ON
M.id1 = CTE.id1 AND
M.max_lvl = CTE.lvl

Related

Getting common value count of text array column in Postgres

I have a table which looks like this:
id num
--- ----
1 {'1','2','3','3'}
2 {'2','3'}
3 {'5','6','7'}
Here id is a unique column and num is a text array which can contain duplicate elements . I want to do something like an intersection between two consecutive rows so that I get the count of common elements between num of two rows. Consider something like a set where duplicates are considered only once. For example, for the above table I am expecting something like the following:
id1 id2 count
--- --- -----
1 2 2
1 3 0
2 1 2
2 3 0
3 1 0
3 2 0
It is not necessary to get the output like the above. The only part I am concerned about is count.
I have the following query which gives the output only for one id compared with one other id:
select unnest(num) from emp where id=1
intersect
select unnest(num) from emp where id=2
How can I generalize it to get the required output?
A straight forward approach puts the intersection of the unnested arrays in a subquery and gets their count.
SELECT t1.id id1,
t2.id id2,
(SELECT count(*)
FROM (SELECT num1.num
FROM unnest(t1.num) num1(num)
INTERSECT
SELECT num2.num
FROM unnest(t2.num) num2(num)) x) count
FROM emp t1
INNER JOIN emp t2
ON t2.id > t1.id
ORDER BY t1.id,
t2.id;
Should you be only interested in whether the arrays share elements or not but not in the exact count, you can also use the overlap operator &&.
SELECT t1.id id1,
t2.id id2,
t1.num && t2.num intersection_not_empty
FROM emp t1
INNER JOIN emp t2
ON t2.id > t1.id
ORDER BY t1.id,
t2.id;
For the example data, this works:
with t as (
select v.*
from (values (1000, array['acct', 'hr']), (1005, array['dev', hr'])) v(empid, depts)
)
select t1.empid, t2.empid,
(select count(distinct d1)
from unnest(t1.depts) d1 join
unnest(t2.depts) d2
on d1 = d2
) cnt
from t t1 join
t t2
on t1.empid < t2.empid;
I'm not 100% sure this is what you intend, though.

How should I query these tables?

That's the database I have:
This is the (first) Offer-table with articles and the respective ID:
This is the (second) Bid-Table with the offered articles:
I have to query the numbers of the articles that have offered the same number of
So I want to spend this here:
ID1 ID2 Number_of_Orders
1 2 2
1 5 2
2 5 2
I tried to join it into inline views:
SELECT DISTINCT * FROM
(SELECT BID.ID as ID1 FROM OFFER
INNER JOIN BID ON OFFER.ID=BID.ID
GROUP BY GEBOT.ID) v1,
(SELECT BID.ID as ID2 FROM OFFER
INNER JOIN BID ON OFFER.ID=BID.ID
GROUP BY BID.ID) v2,
(SELECT COUNT(GID) as NUMBER_OF_ORDERS FROM BID
INNER JOIN OFFER ON OFFER.ID=BID.ID
GROUP BY BID.ID
) v3;
but I do not know how I should spend the two IDs under the condition that they have the same number of orders (bids)
You seem to want to count the bids for each ID, and then do a self-join on that result to find matches:
with cte (id, number_of_bids) as (
select id, count(*)
from bid
group by id
)
select c1.id as id1, c2.id as id2, c1.number_of_bids
from cte c1
join cte c2
on c2.number_of_bids = c1.number_of_bids
and c2.id > c1.id
order by id1, id2;
ID1 ID2 NUMBER_OF_BIDS
---------- ---------- --------------
1 2 2
1 5 2
2 5 2
The CTE just gets the number of offers for each ID with simple aggregation. (You could do it with inline views instead of a CTE, but you'd be counting them twice, once in each inline view).
Then the main query joins that CTE to itself on the aggregated number_of_bids being equal, and also one the second ID being higher than the first - which eliminates duplicates. Without doing that you'd see a row where ID1 was 5 and ID2 was 2, i.e. the reverse of the last of the three rows you want (and the same for the other two), plus each ID/count matched with itself.
You don't need to join to the offer table - you aren't using anything data from that.
you simply join(inner join) these two tables and put the condition such as table1.bidPrice = table2.bidPrice

SQL query on same duplicate tables

Working with Access, I have the following table
ID Root_ID Level Code
S1 S 10 ABC
S3 S 20 DFG
L4 L 10 FFF
L4 L 20 GGG
F2 F 10 ABC
What I'm looking for is: rows having the same code, on the same level but different Root_IDs.
I created a query with twice the same table T and an innner join on both the Level and the code. I tried this first before trying to identify the different Root_IDs but returned results were wrong...
Here for example, the result should be:
ID Root_ID Level Code
S1 S 10 ABC
F2 F 10 ABC
Thanks for your help!
You can find the code, level for which there are more than one unique root_ids in a subquery and then join it with table to get the complete rows.
select a.*
from your_table as a
inner join (
select code, level
from your_table
group by code, level
having count(distinct root_id) > 1
) as b on a.code = b.code
and a.level = b.level
Try this
select t1.*
from your_table as t1
inner join (
select code, level
from your_table
group by code, level
having min(root_id) <> max(root_id)
) as t2 on t1.code = t2.code
and t1.level = t2.level

SQL join without losing rows

I have 2 tables with the same schema of userID, category, count. I need a query to sum the count of each userID/category pair. Sometimes a pair will exist in one table and not the other. I'm having trouble doing a join without losing the rows where a userID/category pair only exists in 1 table. This is what I'm trying (without success):
select a.user, a.category, count=a.count+b.count
from #temp1 a join #temp2 b
on a.user = b.user and a.category = b.category
Example:
Input:
user category count
id1 catB 3
id2 catG 9
id3 catW 17
user category count
id1 catB 1
id2 catM 5
id3 catW 13
Desired Output:
user category count
id1 catB 4
id2 catG 9
id2 catM 5
id3 catW 30
Update: "count" is not the actual column name. I just used it for the sake of this example, and I forgot it's a reserved word.
You need to:
Use a full outer join so you don't drop rows present in one table and not the other
Coalesce counts prior to addition, because 0 + NULL = NULL
Also, because COUNT is a reserved word, I would recommend escaping it.
So, using all of these guidelines, your query becomes:
SELECT COALESCE(a.user, b.user) AS user,
COALESCE(a.category, b.category) AS category,
COALESCE(a.[count],0) + COALESCE(b.[count],0) AS [count]
FROM #temp1 AS a
FULL OUTER JOIN #temp2 AS b
ON a.user = b.user AND
a.category = b.category
One way to approach this is with a full outer join:
select coalesce(a.user, b.user) as user,
coalesce(a.category, b.category) as category,
coalesce(a.count, 0) + coalesce(b.count, 0) as "count"
from #temp1 a full outer join
#temp2 b
on a.user = b.user and
a.category = b.category;
When using full outer join, you have to be careful because the key fields can be NULL when there is a match in only one table. As a result, the select tends to have a lot of coalesce()s (or similar constructs).
Another way is using a union all query with aggregation:
select "user", category, SUM(count) as "count"
from ((select "user", category, "count"
from #temp1
) union all
(select "user", category, "count"
from #temp2
)
) t
group by "user", category

SQL selecting adjacent rows for a adjacent set

I'm having trouble doing the following in SQL with Postgres. My program has an ordered set of numbers. In my database I have the table which stores all numbers in rows with extra data. These rows are also placed in order.
For example my set I need to find is;
1,5,6,1,3
The database has rows
row1 4
row2 5
row3 1
row4 5
row5 6
row6 1
row7 3
row8 2
row9 7
In the example above it's easy to see that my set is found from row 3 to to row7. Still doing such in SQL is a mystery to me. I'm reading some articles regarding pivot tables, still I'm hoping there's an easier way.
Both data-sets need to have fields that identify the order.
And provided that the ordering column is a sequential consecutive set of numbers, then this is possible, although I doubt it's very quick.
Table 1 Table 2
id | value id | value
1 4 1 1
2 5 2 5
3 1 3 6
4 5 4 1
5 6 5 3
6 1
7 3
8 2
9 7
Then this query...
SELECT
*
FROM
table_1
INNER JOIN
(
SELECT
MIN(table_1.id) AS first_id,
MAX(table_1.id) AS last_id
FROM
table_1
INNER JOIN
table_2
ON table_1.value = table_2.value
GROUP BY
table_1.id - table_2.id
HAVING
COUNT(*) = (SELECT COUNT(*) FROM table_2)
)
AS matched_sets
ON matched_sets.first <= table_1.id
AND matched_sets.last >= table_1.id
Recursive version
#Dems beat me to it: a recursive CTE is the way to go here. It works for any sequence of numbers. I post my version because:
It does not require an additional table. Just insert your sequential numbers as array.
The recursive CTE itself is simpler.
The final query is smarter.
It actually works in PostgreSQL. #Dems recursive version is not syntactically correct in it's current state.
Test setup:
CREATE TEMP TABLE t (id int, val int);
INSERT INTO t VALUES
(1,4),(2,5),(3,1)
,(4,5),(5,6),(6,1)
,(7,3),(8,2),(9,7);
Call:
WITH RECURSIVE x AS (
SELECT '{1,5,6,1,3}'::int[] AS a
), y AS (
SELECT t.id AS start_id
,1::int AS step
FROM x
JOIN t ON t.val = x.a[1]
UNION ALL
SELECT y.start_id
,y.step + 1 -- AS step -- next step
FROM y
JOIN t ON t.id = y.start_id + step -- next id
JOIN x ON t.val = x.a[1 + step] -- next value
)
SELECT y.start_id
FROM x
JOIN y ON y.step = array_length(x.a, 1) -- only where last steps was matched
Result:
3
Static version
Works for a predefined number of array items, but is faster for small arrays. 5 items in this case. Same test setup as above.
WITH x AS (
SELECT '{1,5,6,1,3}'::int[] AS a
)
SELECT t1.id
FROM x, t t1
JOIN t t2 ON t2.id = t1.id + 1
JOIN t t3 ON t3.id = t1.id + 2
JOIN t t4 ON t4.id = t1.id + 3
JOIN t t5 ON t5.id = t1.id + 4
WHERE t1.val = x.a[1]
AND t2.val = x.a[2]
AND t3.val = x.a[3]
AND t4.val = x.a[4]
AND t5.val = x.a[5];
how about...
Select instr(',' & Group_Concat(mNumber SEPARATOR ',') &',',#yourstring)
FROM Table
Whoops that's my SQL have to look up similar functions for Postgresql...
Postgresql Version of Group_concat
All this does is group multiple rows into one long string and then do a "Find" to return the first position of your string in the generated long string. The returned number will match the row_number. If 0 is returned your string isn't in the generated one. (may have to be cautious with the ', ' comma space.
Recursive answer...
WITH
CTE AS
(
SELECT
id AS first_id,
id AS current_id,
1 AS sequence_id
FROM
main_table
WHERE
value = (SELECT value FROM search_table WHERE id = 1)
UNION ALL
SELECT
CTE.first_id,
main_table.id,
CTE.sequence_id + 1
FROM
CTE
INNER JOIN
main_table
ON main_table.id = CTE.current_id + 1
INNER JOIN
search_table
ON search_table.value = main_table.value
AND search_table.id = CTE.sequence_id + 1
)
SELECT
*
FROM
main_table
INNER JOIN
CTE
ON main_table.id >= CTE.first_id
AND main_table.id <= CTE.current_id
WHERE
CTE.sequence_id = (SELECT COUNT(*) FROM search_table)