Non Equi Join in hive

Non Equi Join in hive - hive

Can someone please help on the below hive query. I know the below wont work as hive doesn't support non equi joins.
SELECT a.ymd, a.price_close, b.price_close
FROM stocks a
JOIN stocks b ON a.ymd <= b.ymd
WHERE a.symbol = 'AAPL' AND b.symbol = 'IBM';

You can cross join then filter:
SELECT a.ymd, a.price_close, b.price_close
FROM
(select a.ymd, a.price_close from stocks a where a.symbol = 'AAPL') a
CROSS JOIN (select b.ymd, b.price_close from stocks b where b.symbol = 'IBM') b
WHERE a.ymd <= b.ymd;

Related

Replace correlated subquery with CTE and JOIN

I am trying to rewrite a query which has a correlated subquery, the idea is to replace it with a CTE and join it later.
I have three tables, tbl_transaction, tbl_beneficiaries and tbl_reg_countries. The current (in short) SQL looks like the following.
SELECT
t.USER_ID,
t.TRANSACTION
FROM tbl_transactions t
JOIN tbl_beneficiaries b ON b.ID = t.USER_ID
WHERE b.COUNTRY NOT IN (
SELECT rc.country
FROM tbl_reg_countries rc
WHERE rc.id = t.USER.ID)
My goal is to query only those transactions for each user where the transaction happens outside of the registered countries. So a user may registered X,Y,Z country but had business with Q. In that case only Q should be returned. How could this be replaced with a CTE/JOIN?

I assume both tbl_beneficiaries.COUNTRY and tbl_reg_countries.COUNTRY are not nullable. You can use a LEFT JOIN with NULL test to detect never matching rows
SELECT
t.USER_ID,
t.TRANSACTION
FROM tbl_transactions t
JOIN tbl_beneficiaries b ON b.ID = t.USER_ID
LEFT JOIN tbl_reg_countries rc ON rc.id = t.USER_ID AND b.COUNTRY = rc.country
WHERE rc.country IS NULL

I would try rewriting query with "with"
Like this:
With a As
(Select
Distinct rc.country
From tbl_reg_countries rc
Inner Join tbl_transactions t on rc.id = t.USER.ID
)
Select
t.USER_ID,
t.TRANSACTION
From tbl_transactions t
Inner Join tbl_beneficiaries b On b.ID = t.USER_ID
Where b.COUNTRY Not In (select * from a)

ERROR: invalid reference to FROM-clause entry for table "oth"

I have a problem with this query
SELECT DISTINCT(oth.book) FROM book_meta_keywords oth,
(SELECT bmk.meta_keyword AS metaKeyword, bmk.book AS book FROM books b
INNER JOIN customers_books cvb ON cvb.book = b.id
INNER JOIN book_meta_keywords bmk ON bmk.book = b.id
WHERE cvb.customer = 1 ) AS allCustomerPurchasedBooksMeta
INNER JOIN books b ON b.id = oth.book
WHERE oth.meta_keyword = allCustomerPurchasedBooksMeta.metaKeyword AND oth.book != allCustomerPurchasedBooksMeta.book AND b.status = 'GOOD'
I am getting below error for this query.
ERROR: invalid reference to FROM-clause entry for table "oth"
LINE 6: INNER JOIN books b ON b.id = oth.book
^
HINT: There is an entry for table "oth", but it cannot be referenced from this part of the query.
, Time: 0.002000s
But if I run the below query it works
SELECT DISTINCT(oth.book) FROM book_meta_keywords oth,
(SELECT bmk.meta_keyword AS metaKeyword, bmk.book AS book FROM books b
INNER JOIN customers_books cvb ON cvb.book = b.id
INNER JOIN book_meta_keywords bmk ON bmk.book = b.id
WHERE cvb.customer = 1 ) AS allCustomerPurchasedBooksMeta
WHERE oth.meta_keyword = allCustomerPurchasedBooksMeta.metaKeyword AND oth.book != allCustomerPurchasedBooksMeta.book
Can anyone help me why... query is basically trying to get similar books based on purchased books based on their meta keywords.
thanks.

This is your FROM clause:
FROM
book_meta_keywords oth,
(SELECT ... FROM ... WHERE ...) AS allCustomerPurchasedBooksMeta
INNER JOIN books b ON b.id = oth.book
You are mixing explicit and implicit joins (the latter is denoted by the comma). Don't. They have different prescendence rules and the query planner ends up evaluating the the second condiiton before oth was seen.
As for how to solve this: assuming that the logic is indeed what you want, that's a lateral join:
FROM
book_meta_keywords oth
CROSS JOIN LATERAL (SELECT ... FROM ... WHERE ...) AS allCustomerPurchasedBooksMeta
INNER JOIN books b ON b.id = oth.book
I suspect, however, that your query could be further simplified. You might want to ask another question for this, explaning the purpose of the query and providing a minimum reproducible example.

You are missing join
SELECT DISTINCT oth.book FROM book_meta_keywords oth join
(SELECT bmk.meta_keyword AS metaKeyword, bmk.book AS book FROM books b
INNER JOIN customers_books cvb ON cvb.book = b.id
INNER JOIN book_meta_keywords bmk ON bmk.book = b.id
WHERE cvb.customer = 1 ) AS allCustomerPurchasedBooksMeta
on oth.meta_keyword = allCustomerPurchasedBooksMeta.metaKeyword and
oth.book != allCustomerPurchasedBooksMeta.book
INNER JOIN books b ON b.id = oth.book
WHERE b.status = 'GOOD'

Well it can work :
SELECT DISTINCT oth.book
FROM book_meta_keywords oth
INNER JOIN books b ON b.id = oth.book
, (SELECT bmk.meta_keyword AS metaKeyword, bmk.book AS book
FROM books b
INNER JOIN customers_books cvb ON cvb.book = b.id
INNER JOIN book_meta_keywords bmk ON bmk.book = b.id
WHERE cvb.customer = 1 ) AS allCustomerPurchasedBooksMeta
WHERE oth.meta_keyword = allCustomerPurchasedBooksMeta.metaKeyword
AND oth.book != allCustomerPurchasedBooksMeta.book
AND b.status = 'GOOD'
But does this do what you need...

How do you work out the average of a sum function within a temp table?

I have created a temp table that lists each client's invoice(s), plus the number of days it took to pay the invoice. A client can have more than one invoice.
Instead of this, I would just like the temp table to list each client once, along with the AVERAGE number of days it took to pay all of their invoices.
Any tips on how to do this would be much appreciated.
Thanks
select
c.client_code,
b.bill_num,
b.bill_date,
ba.TRAN_DATE,
sum(datediff(Day,b.BILL_DATE, ba.TRAN_DATE)) as Days_To_Pay
into #tempG1
from blt_bill b
left outer join blt_billm bm on b.tran_uno = bm.bill_tran_uno
left outer join BLT_BILL_AMT ba on bm.BILLM_UNO = ba.BILLM_UNO
left outer join hbm_matter m on bm.matter_uno = m.matter_uno
left outer join hbm_client c on m.client_uno = c.client_uno
where b.total_bill_amt > 0.0
and bm.ar_status NOT IN ('P','X')
and ba.TRAN_TYPE in ('CR','crx')
group by c.client_code,b.bill_num,b.bill_date,ba.TRAN_DATE
select * from #tempG1
Drop Table #tempG1

I am not familiar with temp tables, but this should work (tested on a simliar scenario on MySQL8 and assuming that #tempG1 return results):
select
c.client_code,
b.bill_num,
b.bill_date,
ba.TRAN_DATE,
sum(datediff(Day,b.BILL_DATE, ba.TRAN_DATE)) as Days_To_Pay
from blt_bill b
left outer join blt_billm bm on b.tran_uno = bm.bill_tran_uno
left outer join BLT_BILL_AMT ba on bm.BILLM_UNO = ba.BILLM_UNO
left outer join hbm_matter m on bm.matter_uno = m.matter_uno
left outer join hbm_client c on m.client_uno = c.client_uno
where b.total_bill_amt > 0.0
and bm.ar_status NOT IN ('P','X')
and ba.TRAN_TYPE in ('CR','crx')
group by c.client_code,b.bill_num,b.bill_date,ba.TRAN_DATE
into #tempG1
############################
SELECT temp.client_code, AVG(temp.Days_To_Pay)
FROM (select * from #tempG1) as temp
GROUP BY temp.client_code
############################
#### Do you see results if drop? ####
Drop Table #tempG1
Note that I put #tempG1, at the bottom of your SELECT request, but might not be what want to achieve, not sure if you want to include your JOIN conditions or not.
Or you could do without temp table(including your join conditions):
SELECT temp.client_code, AVG(temp.Days_To_Pay)
(
select
c.client_code,
b.bill_num,
b.bill_date,
ba.TRAN_DATE,
sum(datediff(Day,b.BILL_DATE, ba.TRAN_DATE)) as Days_To_Pay
from blt_bill b
left outer join blt_billm bm on b.tran_uno = bm.bill_tran_uno
left outer join BLT_BILL_AMT ba on bm.BILLM_UNO = ba.BILLM_UNO
left outer join hbm_matter m on bm.matter_uno = m.matter_uno
left outer join hbm_client c on m.client_uno = c.client_uno
where b.total_bill_amt > 0.0
and bm.ar_status NOT IN ('P','X')
and ba.TRAN_TYPE in ('CR','crx')
group by c.client_code,b.bill_num,b.bill_date,ba.TRAN_DATE
) as temp
GROUP BY temp.client_code

This sounds like a simple aggregation:
select c.client_code, avg(datediff(Day, b.BILL_DATE, ba.TRAN_DATE)) as Days_To_Pay
from blt_bill b join
blt_billm bm
on b.tran_uno = bm.bill_tran_uno join
BLT_BILL_AMT ba
on bm.BILLM_UNO = ba.BILLM_UNO join
hbm_matter m on bm.matter_uno = m.matter_uno join
hbm_client c
on m.client_uno = c.client_uno
where b.total_bill_amt > 0.0 and
bm.ar_status not in ('P', 'X') and
ba.TRAN_TYPE in ('CR', 'crx')
group by c.client_code;
Note that you do not need outer joins. The where clause is turning most of them into inner joins anyway. Plus, if you are aggregating by the client code, then presumably you want a non-NULL value.

How to construct a SQL sub query in SQL Server 2008?

I have requirement to extract total number of rows from a table - ci_periodicBillings only for clients where they have rows from a particular date range from another table - ci_invoiceHeaders. I am using MS SQL Server 2008, connecting via ODBC.
I have created a subquery which works but only if the total number of rows from ci_periodicBillings is 1. I'm finding if there is more than 1 result from ci_periodicBillings, it's multiplying the rows found by the number of rows meeting the criteria from ci_invoiceHeaders.
I only want to show only the rows from ci_periodicBillings without any multiplication if the criteria is met in ci_invoiceHeaders. I'm sure there is an easy solution to this but I can't see the wood from the trees at the moment.
There are a few other tables used for listing purposes only (i.e. facilities/clients etc)
SQL is here:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT
b.name,
b.forename,
b.surname,
a.client,
cast(a.BILLSTART as DATE) as BILLSTART,
cast(a.ENDBILL as DATE) as ENDBILL,
a.RATE
FROM ci_periodicBillings as a
inner join
(select f.name,
c.surname,c.forename,ih.client,ih.invoiceDate
FROM ci_invoiceHeaders ih
LEFT JOIN ci_invoiceDetails id ON ih.invoiceNo = id.id
INNER JOIn cs_clients c ON ih.client = c.guid
INNER JOIN cs_facilities f ON c.facility = f.guid
group by f.name, c.surname,
c.forename, ih.client, ih.invoiceDate)
as b
on a.client = b.client
WHERE b.invoiceDate between '2017-08-01' and '2018-01-31'
order by a.client
Any ideas please?

Try this:
SELECT b.name, b.forename, b.surname, a.client,
cast(a.BILLSTART AS DATE) AS BILLSTART,
cast(a.ENDBILL AS DATE) AS ENDBILL, a.RATE
FROM ci_periodicBillings AS a inner join
(SELECT f.name, c.surname,c.forename,ih.client,DATE(ih.invoiceDate) invoiceDate
FROM ci_invoiceHeaders ih
LEFT JOIN ci_invoiceDetails id ON ih.invoiceNo = id.id
INNER JOIn cs_clients c ON ih.client = c.guid
INNER JOIN cs_facilities f ON c.facility = f.guid
WHERE ih.invoiceDate BETWEEN '2017-08-01' AND '2018-01-31'
GROUP BY f.name, c.surname,c.forename,ih.client,DATE(ih.invoiceDate)) AS b
ON a.client = b.client
ORDER BY a.client;

avoid repeating condition in select query

I have the following query to be executed on postgresql
SELECT COUNT(DISTINCT id_client) FROM contract c
INNER JOIN bundle b ON c.bundle_id = b.id
INNER JOIN payment_method pm ON pm.id = c.payment_method_id
WHERE country_id=1 AND b.platform_id=1 AND pm.name <> 'RIB'
AND CONDITION_1
AND id_client NOT IN (
SELECT id_client FROM contract c1
INNER JOIN bundle b1 ON (c1.bundle_id = b1.id)
INNER JOIN payment_method pm1 ON pm1.id = c1.payment_method_id
WHERE c1.country_id=1 AND b1.platform_id=1 AND pm1.name <> 'RIB'
AND CONDITION_2);
I don't like it because it's the same query repeated twice except of CONDITION_1 and CONDITION_2 (and I have another example where it's repeated 3 times).
It's also very slow as well.
I tried to rewrite it as the following:
WITH
filter_cpm AS (
SELECT * FROM contract c
INNER JOIN bundle b ON b.id = c.bundle_id
INNER JOIN payment_method pm ON pm.id = c.payment_method_id
WHERE c.country_id = 1 AND b.platform_id = 1 AND pm.name <> 'RIB'
)
SELECT COUNT(DISTINCT id_client) FROM filter_cpm
WHERE CONDITION_1
AND id_client NOT IN (
SELECT id_client FROM filter_cpm
WHERE CONDITION_2);
Now it's DRY but it's two times slower.
How can I re-write the query to have the same (or better) performance?
EDIT: I cannot join two conditions with AND. For example if CONDITION_1 and CONDITION_2 are VIP, then I want to select clients who were re-qualified from NOT VIP to VIP.

You can select from the common table expression twice, using an outer join:
WITH filter_cpm AS (SELECT *
FROM CONTRACT c
INNER JOIN BUNDLE b
ON b.ID = c.BUNDLE_ID
INNER JOIN PAYMENT_METHOD pm
ON pm.ID = c.PAYMENT_METHOD_ID
WHERE c.COUNTRY_ID = 1 AND
b.PLATFORM_ID = 1 AND
pm.NAME <> 'RIB')
SELECT COUNT(DISTINCT fc1.ID_CLIENT)
FROM filter_cpm fc1
LEFT OUTER JOIN filter_cpm fc2
ON fc2.ID_CLIENT = fc1.ID_CLIENT AND
CONDITION_2
WHERE fc1.CONDITION_1 AND
fc2.ID_CLIENT IS NULL
Best of luck.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Non Equi Join in hive - hive

Can someone please help on the below hive query. I know the below wont work as hive doesn't support non equi joins. SELECT a.ymd, a.price_close, b.price_close FROM stocks a JOIN stocks b ON a.ymd <= b.ymd WHERE a.symbol = 'AAPL' AND b.symbol = 'IBM';

You can cross join then filter: SELECT a.ymd, a.price_close, b.price_close FROM (select a.ymd, a.price_close from stocks a where a.symbol = 'AAPL') a CROSS JOIN (select b.ymd, b.price_close from stocks b where b.symbol = 'IBM') b WHERE a.ymd <= b.ymd;

Related

Replace correlated subquery with CTE and JOIN

ERROR: invalid reference to FROM-clause entry for table "oth"

How do you work out the average of a sum function within a temp table?

How to construct a SQL sub query in SQL Server 2008?

avoid repeating condition in select query

Categories

Resources