Self join queries in hive

Self join queries in hive - sql

I need a hive query to fetch the hierarchy in which my product was sold .
Considering the below records , the end customer was 1 and 6 , since their SoldTo column value is NULL.
CustomerID SoldTo
--------------------
1 NULL
2 1
3 2
4 3
5 4
6 NULL
7 1
8 6
My output should look like :
c1 c2 c3 c4 c5
-------------------
5 4 3 2 1 (c1 (5) - first customer who bought product and c5(1) -last customer)
8 6 (c1 (8) - first customer , c2 (6)- Last customer)
7 1

Hive has no real support for recursive CTEs or hierarchical data structures. You can do this using multiple joins -- but the depth of the hierarchy is fixed.
select t1.CustomerId as c1, t2.CustomerId as c2, t3.CustomerId as c3,
t4.CustomerId as c4, t5.CustomerId
from t t1 left join
t t2
on t2.SoldTo = t1.CustomerId left join
t t3
on t3.SoldTo = t2.CustomerId left join
t t4
on t4.SoldTo = t3.CustomerId left join
t t5
on t5.SoldTo = t4.CustomerId
where t1.CustomerId is null;

Related

SQL Server : select from multiple tables and calculate percentages

I have four tables and I would need to extract data from them to calculate the percentage
Table1
ID FK1 FK2
------------------
1 1 1
2 2 2
3 3 3
Table2
ID Name
------------------
1 L1
2 A1
3 B
Table3
ID FK3
------------------
1 1
2 2
3 3
Table4
ID Name
------------------
1 BA
2 N
3 CE
Now I need to get a Name from table4, which will be displayed as individual rows and then a Name from table2, which will be listed as individual columns and the value will then be a percentage of the record from table4:
Name L1 A1 B
---------------------------
BA 20(%) 40(%) 40(%)
N 30(%) 20(%) 30(%)
CE 15(%) 15(%) 70(%)
Because there are links, I'll give an example of what question I have now
select t3.Name
from table1 t1 (nolock)
join table2 t2 (nolock) on t1.FK1 = t2.ID
join table3 t3 (nolock) on t1.FK2 = t3.ID
join table4 t4 (nolock) on t2.FK3 = t4.ID
Does anyone have any idea how to do this? Thank you very much

Join two tables, using value from the first unless it is null, otherwise use value from the second

I have three tables which look like those:
TABLE 1
id j_id
1 1
2 2
3 3
TABLE 2
id j_id table1_id
1 57 1
2 84 1
3 1 1
4 9 2
5 2 2
and every j has a value in a third table
id value
1 1abc
2 2bcd
3 3abc
57 57abc
84 84abc
9 9abc
I am trying to write a query which will join table 1 and table 2 and use the J value from the third table instead of the j_id, but the problem is that I want to use the j value from the second table if it exists and otherwise use the value from the first table.
in order the make it clearer this is my query result without using the third table:
tbl1.j_id tbl2.j_id
1 1
1 84
1 57
2 2
2 9
3 null
I want the end query result to use the second table's j value unless it is null:
tbl1.j_id tbl2.j_id j_id
1 1 1abc
1 84 84abc
1 57 57abc
2 2 2abc
2 9 9abc
3 null 3abc
(Question and title edits are more than welcome, weren't that sure how to phrase them..)

You can simply JOIN to table3 on the COALESCE of table2.j_id and table1.j_id:
SELECT t1.j_id AS t1_j_id, t2.j_id AS t2_j_id, t3.value
FROM table1 t1
LEFT JOIN table2 t2 ON t2.table1_id = t1.id
JOIN table3 t3 ON t3.id = COALESCE(t2.j_id, t1.j_id)
Output:
t1_j_id t2_j_id value
1 1 1abc
1 57 57abc
1 84 84abc
2 2 2bcd
2 9 9abc
3 null 3abc
Demo on dbfiddle

One solution is to left join table3 twice:
select
t1.j_id,
t2.j_id,
coalesce(t31.value, t32.value) j_value
from
table1 t1
left join table2 t2 on t2.table1_id = t1.id
left join table3 t31 on t31.id = t2.j_id
left join table3 t32 on t32.id = t1.j_id

Compare multiple Rows Based on another table

Lets say I have following tables:
Table1
ID Number
1 2
2 34
3 1 <---- Input (ID = 3) ==> (Number = 1)
4 6
5 5
*6* 7 <---- Want to find (Number = 6) because match in Table2
7 22
and Table2
Number Code Att1 Att2 Att3
1 1 1 <-----|
1 2 1 2 <-----|
6 2 f 2 |
6 3 4 3 2 |
2 4 6 |---Match
22 5 2 2 2 |
5 2 h 3 b |
7 1 1 <-----|
7 2 1 2 <-----|
7 h 5 r
So here is my Problem:
I want the IDs from Table1 that have all Code and Attributes from Table2 that a given (variable) input ID has. At the end I want to create a stored procedure/function that gives me all IDs meeting that condition.
As an Example:
Input-ID: 3. Would return ID 6 because Number 7 (mapped from ID 6 in Table1) has the rows Number 1 (mapped from ID 3 in Table1) has. It has more but that doesn't matter, its just important it has all rows the input one has.
(I can't find a solution to comparing a set of rows to another set of rows that is not known before.)
Thanks for any help!
Edit:
To make it more understandible, here what I want in words step-by-step.
Map input ID to Number in Table1
Get All Rows from Table2 having Number from Step 1
Get all Number that have the same (can have more) Rows as from Step 2
Get IDs for that Numbers (and return them)

Try something like this. Haven't tested it, but basically you inner join on all of the attributes that need to match. The HAVING clause is a crude check to make sure that it matched all the rows. Edit: Forgot to add the input ID WHERE clause.
SELECT t1b.ID FROM
Table1 t1a
INNER JOIN Table2 t2a ON t1a.Number = t2a.Number
INNER JOIN Table2 t2b ON t2a.Number <> t2b.Number AND t2a.Code = t2b.CODE AND t2a.Att1 = t2b.Att1 AND t2a.Att2 = t2b.Att2 AND t2a.Att3 = t2b.Att3
INNER JOIN Table1 t1b ON t1b.Number = t2b.Number
WHERE t1a.ID = 3
GROUP BY t1b.ID
HAVING COUNT(*) = (SELECT COUNT(*) FROM Table1 WHERE ID = t1a.ID)

select t11.ID as Id_To_Find,t12.ID as Id_Found
from Table1 t11
join (
select t21.Number as Found,t22.Number as ToFind from Table2 t21
left join Table2 t22 on t21.Code = t22.Code
and t21.Att1 = t22.Att1
and t21.Att2 = t22.Att2
and t21.Att3 = t22.Att3
and t21.Number <> t22.Number
group by t21.Number,t22.Number
having COUNT(*) = (select COUNT(*) from Table2 where Number = t22.Number))
as FindMatches
on t11.Number = FindMatches.ToFind
join Table1 t12 on t12.Number = FindMatches.Found

Kind of hard to understand what you're trying to acheive. As i understood from your example, you want to match the Number for the input ID in Table1 with any column (correct?) in Table2.
With input ID=3, the SELECT will return Number=7. In the IN (...)-condition, you can specify whichever columns in Table2 you want to match to Table1.Number.
DECLARE #Input INT = 3 -- Your input
SELECT DISTINCT t1.Number
FROM Table1 t
INNER JOIN Table2 t2 ON t.Number IN (t2.Number, t2.Code, t2.Att1, t2.Att2, t2.Att3)
INNER JOIN Table1 t1 ON t2.Number = t1.Number AND t.ID <> t1.ID
WHERE t.ID = #Input

Joining a table thru another property of another table that is linked with id

Edit : I did a mistake the Invoices Table carry the transactionId
I have 3 tables :
Transactions Reconciliations Invoices
id num line transId id Code transId
-- --- ---- ------- -- ---- -------------
3 1 1 3 5 Code 5 3
6 1 2 6 9 Code 9 8
7 1 3 7 12 Code 12 11
8 2 1 8
12 2 2 12
10 3 1 10
11 3 2 11
and this Query :
select
t1.id -- transaction id
t2.num -- reconciliation number
t3.Code -- Invoice code
from Transactions t1
left outer join Reconciliations t2 on t2.transId = t1.id
left outer join Invoices t3 on t3.transId = t1.id
Giving the following result :
id num code
-- --- ----
3 1 Code 5
6 1 null
7 1 null
8 2 Code 9
12 2 null
10 3 null
11 3 Code 12
But what I want is this :
id num code
-- --- ----
3 1 Code 5
6 1 Code 5
7 1 Code 5
8 2 Code 9
12 2 Code 9
10 3 Code 12
11 3 Code 12
To put words on it when the linked Invoice table gives null I want to join on all the records from Reconciliations with the same Reconciliation number.
Edit : I would like the Code in invoices to be shared across all transactions that shares the same Reconciliation number
I have tried to do thru outer apply and sub query but I cannot figure out a way to achieve it. Have you any idea ?

The solution is to join to Reconciliations again before joining to Invoices:
select t.id, r.num, i.Code
from Transactions t
join Reconciliations r on r.transId = t.id
join Reconciliations r2 on r2.num = r.num
join Invoices i on i.transId = r2.transId
Note that the joins are now inner joins (requiring a match), and how you easily make the connection to the right Invoice via the shared Reconciliation.num value - using inner joins means you only get the invoice row that matches.
To see this query in action, execute it on SQLFiddle
Edit: To cater for missing invoices
Use left join to invoices, but you need a group by with max() to limit the joins to just one invoice per transaction (without the max() you get lots of extra rows with null Code):
select t.id, r.num, max(i.Code) as Code
from Transactions t
join Reconciliations r on r.transId = t.id
join Reconciliations r2 on r2.num = r.num
left join Invoices i on i.transId = r2.transId
group by t.id, r.num
To see this query in action, where I have invalidated invoice 12 from above fiddle, execute it on SQLFiddle

You seem to want to spread the InvoiceId in Transactions up to the next value.
Here is one method:
select t.*
(select top 1 InvoiceId
from Transactions t2
where t2.id <= t.id and t2.InvoiceId is not NULL
order by id desc
) as newInvoiceId
from transactions t;
You can then substitute this into your query:
select
t1.id -- transaction id
t2.num -- reconciliation number
t3.Code -- Invoice code
from (select t.*
(select top 1 InvoiceId
from Transactions t2
where t2.id <= t.id and t2.InvoiceId is not NULL
order by id desc
) as newInvoiceId
from transactions t
) t1
left outer join Reconciliations t2 on t2.transid = t1.id
left outer join Invoices t3 on t3.id = t1.transid ;

SELECT statement with multiple WHERE criteria (MS-Access)

Below is the sample data:
c1 c2 c3 c4 c5
1 a1 a 1 1
2 a2 a 2 1
3 a3 a 3 1
4 a4 a 4 1
5 b1 b 1 1
6 b2 b 2 1
7 b3 b 3 1
8 b4 b 4 1
9 a1 c 3 1
I want to get the the below details:
c1 c2 c3 c4 c5
1 a1 a 1 1
5 b1 b 1 1
9 a1 c 3 1
C1 is primary key, the criteria is for any given unique(c2) where c4 is the lowest, I want to return the contents(all the 5 columns) of the row.

Try this:
SELECT t1.*
FROM Table1 t1
INNER JOIN
(
SELECT c3, MIN(c4) c4
FROM Table1
GROUP BY c3
) t2 ON t1.c3 = t2.c3 ANd t1.c4 = t2.c4
SQL Fiddle Demo
Update:1 In SQL the returned results is a set set(unless you specify an ORDER BY clause, it is a cursor in this case), wherein the order is not guaranteed. This is a standard. You should use an ORDER BY clause if you want to guarantee a specific order. In your case , the results is not guaranteed to be ordered like 1 5 9. Add ORDER BY c1 instead.
The ORDER BY clause might be crucial in some cases, for example, if want to get the top three rows, or the maximum one, in this case you have to specify an ORDER BY clause.
So if you wants to persist a specific order the you have specify an ORDER BY.
1 As noted by #Fahim Parker, see the comments below.

select c1,c2,c3,c4,c5
from table
where c4= (select min(c4) from table as f where f.c4 = table.c4);
i hope that helps

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Self join queries in hive - sql

Related

SQL Server : select from multiple tables and calculate percentages

Join two tables, using value from the first unless it is null, otherwise use value from the second

Compare multiple Rows Based on another table

Joining a table thru another property of another table that is linked with id

SELECT statement with multiple WHERE criteria (MS-Access)

Categories

Resources